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A careful derivation of the generalized Langevin equation using "Zwanzig flavor" projection 
operator formalism is presented. We provide arguments why this formalism has better prop- 
erties compared to alternative projection-operator formalisms for deriving non-equilibrium, non- 
thermodynamic-limit, equations. The two main ingredients in the derivation are Liouville's theorem 
and optimal prediction theory. 

As a result we find that equations for non-equilibrium thermodynamics are dictated by the for- 
malism once the choice of coarse-grained variables is made. This includes a microcanonical en- 
tropy definition dependent on the coarse-grained variables. Based on this framework we provide a 
methodology for succesive coarse-graining. As two special cases, the case of linear coefficients and 
coarse-graining in the thermodynamic limit are treated in detail. In the linear limit the formulas 
found are equivalent with those of homogenization theory. 

In this framework there are no restrictions with respect to the thermodynamic-limit or near- 
ness to equilibrium. We believe the presented approach is very suitable for the development of 
computational methods by means of coarse-graining from a more detailed level of description. 
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I. INTRODUCTION 

This work started out as a investigation into the mi- 
croscopic basis of the GENERIC formahsm [ESU. The 
GENERIC formahsm is usuahy presented as a generic 
formulation of non-equihbrium thermodynamics. The 
book 3 gives a refinement of the formahsm compared 
to the earher pubhcations [l], [2] , especiaUy for the case 
where fluctuations are important. Already several pub- 
lications d, 0, S] have made a link between GENERIC 
building blocks and microscopic expressions. However, at 
several points these derivations make not fully justified 
approximations. 

Instead of taking the formalism as given and try to 
collect evidence for its correctness we take a different 
route. Here we use a constructive approach. From bot- 
tom up, we construct a non-equilibrium thermodynamics 
theory. By performing this exercise we find many results 
that are in itself interesting. The final result is a set of 
equations that are close to the equations presented by 
H.C. Ottinger [1]. We find that the GENERIC formal- 
ism presents a special, important, case. 

The GENERIC formalism tries to embed many of the 
non-equilibrium approaches that can be found in litera- 
ture. It does not make a choice on, e.g., the use of mi- 
crocanonical or canonical ensembles. It requires that the 
final equations has a certain structure, e.g., the Poisson- 
structure of the reversible part of the dynamics. Some 
of these requirements come from the philosophy that the 
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coarse-grained equations should inherit as much as possi- 
ble structure/symmetry from the underlying microscopic 
equations of motion. 

In our constructive approach we find, similarly to some 
of the approaches to equilibrium thermodynamics, that 
the microcanonical ensemble should be taken as a start- 
ing point. We find that in non near-to-equilibrium cases 
the Zwanzig projection operator method (based on the 
microcanonical ensemble) allows for better approxima- 
tions than projection-operator methods which we label as 
Mori, Robertson and Grabert. In this sense our approach 
is more restrictive than the GENERIC formalism. Once 
a set of coarse-variables is chosen the full framework is 
fixed. What we are not concerned with here is the ques- 
tion what constitutes an optimal choice of coarse-grained 
variables. 

Most of the GENERIC framework, such as degeneracy 
conditions follow from the derivations irrespective of the 
choice of variables. One feature we could not proof for the 
general case is the "Poisson-structure" of the reversible 
part of the dynamics. In practice, however, it turns out 
that continuum equations such as hydrodynamic equa- 
tions have this structure. Obviously it would be very 
nice if the Poisson-structure survives coarse-graining al- 
ways. The claim of the GENERIC formalism seems to 
be that one is always able to choose a set of suitable 
variables such that this structure does survive. 

To our opinion, it remains to be proved that the 
Poisson-structure can be retained for intermediate, meso- 
scopic scale coarse-graining. This matter is out of the 
scope of the current paper. We will investigate here, 
the more general question: given a set of coarse-grained 
variables, X, how to describe its dynamics? The ques- 
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tion whether one always can find a suitable set of coarse- 
grained variables such that the Poisson-structure is found 
will be addressed in a future paper 0]. 

The method used is the projection operator approach 
to non-equilibrium statistical mechanics. This approach 
is related to topics such as BBGKY-hierarchy and Green- 
Kubo relations. The projection operator method fol- 
lowed from a program of formal solutions of the Liou- 
ville equation starting from the middle of the 20th cen- 
tury. Different mathematical techniques have been used, 
such as diagrammaticjTj , continued fractions 8] , Hilbcrt- 
space approaches 0, . 

One of the problems when trying to apply projection 
operator formalism is which one to use. Since there 
are many flavors, starting with a different flavor will 
result in a different outcome. The outcome of all the 
projection-operator theories is a formally exact result. 
This means the Liouville equation cast in a different form 
(and slightly different among different ffavors). It might 
therefore seem that all are equally good starting points, 
since nothing is lost. This is not the case. The equations 
resulting from projection operator formalism are an al- 
ternative statement of a problem that can not be solved 
exactly. Nothing is lost, but neither anything is gained. 
So it seems. The strength of the alternative statements 
is the possibility to make approximations. The different 
flavors of projection operator formalism should therefore 
be judged by the quality of the approximations one can 
make. The best flavor is the one that, after approxima- 
tions, produces results that are closest to exact results. 

We find that, what we call, the "Zwanzig flavor" is in 
this respect superior to the others. We proof that many 
of the projection-operator methods produce results that 
are, after approximation, valid only near equilibrium and 
at the thermodynamics limit. They are therefore no good 
basis for a general theory of non-equilibrium thermody- 
namics. The main argument of why the Zwanzig for- 
malism is superior comes from optimal-prediction theory 

mm- 

One of the main contribution of this paper is in point- 
ing out that the correct starting point for non-equilibrium 
thermodynamics is the "Zwanzig flavor" projection op- 
erator formalism. From this it follows that the ensemble 
is the microcanonical ensemble. The entropy deflnition 
follows from this well motivated choice. Equilibrium sta- 
tistical mechanics then follows from the deflnition of the 
entropy. So, here, differently from other derivations, sta- 
tistical mechanics follows from the derivation and is not 
used as an input. 

The goal of the projection-operator formalism is to de- 
scribe a system using less degrees of freedom. This is 
coarse-graining. Coarse-graining is the second main topic 
to be discussed. We will provide a method for progressive 
coarse-graining (of already coarse-grained descriptions). 
This includes issues such as memory effects. We will 
derive useful expressions relating the quantities at dif- 
ferent levels of coarse-graining. Our approach to coarse- 
graining circumvents the requirement of a wide separa- 



tion of time-scales. We will gives expressions that are 
generally valid. 

Our goal is to explicitly establish a useful theory that 
is valid outside of the thermodynamic limit and far from 
equilibrium. However, to show that the results have the 
correct limiting behavior we will also discuss the near- 
equilibrium linear regime and the non-linear continuum 
limit. In the linear regime these relations corresponds to 
those found in homogenization theory [l3| . 

A specific form of coarse-graining occurs when locally 
the thermodynamic-limit is valid. In this case, gener- 
alized canonical ensembles, do appear. A discussion on 
this topic will put our approach in a wider perspective. 
We will briefiy discuss how to relate this work to is- 
sues as equivalences of ensembles (large-deviation theory 
[l3iliB|)- Also the relation with approaches starting from 
a Gibbs-entropy definition will be discussed. 

A. notation 

In this paper we will mainly use an index-free nota- 
tion. Most quantities can be interpreted as columns of 
numbers. So for A, B , X or Y one could imagine A^, B^, 
X^ or y with i running from 1 to n. There is not a lot 
of "structure" , such as a metric, defined. The only rele- 
vant structure turns out to be the (pushforward of the) 
Liouville measure or volume form. Therefore we will not 
assume that quantities are tensorial perse. Where nec- 
essary transformation rules will be provided. Especially 
entropy will turn out not to be a scalar quantity. 

In many equations we will use a dot-product. The 
dot-product indicates a contraction over indexes. How- 
ever, since there is no metric defined, most contractions 
would not make sense. The dot-product indicates allowed 
contractions. It indicates a dual-pairing rather than an 
inner- product. We will, often implicitly, assume a up- 
per and lower index convention to distinguish between 
the dual spaces. Contractions are only possible over up- 
per and lower indexes of components defined on spaces 
which are each others duals. To demonstrate this con- 
vention we give the placement of the indexes of some of 
the quantities to be encountered: 

X\ A\ Q'^, n'^, Ay, (V2M)^, (dW)". (1) 

This has as consequence that we write, e.g, for a coor- 
dinate transformation or coarse-graining, of fi*-', 

n^ = ^.f|.^,f^-V .^.17-.^. (2) 

dx dx' dx^ dX3 ^ ' 

Note that this convention is different from matrix mul- 
tiplication because there one would expect an trans- 
posed matrix at the end. The transposed operator in, 
e.g, Vl^ — — indicates interchanging of indexes, i.e., 
— — riJ'. Note that interchanging of upper and lower 
indexes does not make much sense when no metric is 
defined. So if one has a quantity the transposition 
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would automatically imply the interchange of indexes i 
and j since this is the only one allowed. Taking into ac- 
count these conventions the presented formulas are with- 
out ambiguities. 

In this paper we will solely use partial derivatives, and 
no functional, ones. The reason is that we believe that 
coarse-grained systems should be viewed as finite, by def- 
inition. The goal of coarse-graining is to reduce the num- 
ber of freedom form say 10^"^ to 10^. Continuum theories 
are smoothed theories that are not valid below a certain 
length-scale. We think, for the purpose to solve these 
equations using a numerical method is better to write 
down finite equations to start with, see Willi 



II. A REVIEW OF PROJECTION OPERATOR 
METHODS 

There are different flavors of projection-operator for- 
malism. In this section we will provide a brief outline 
of the main flavors. Detailed technical arguments why 
the Zwanzig method is to be preferred compared to the 
alternative methods are mainly found in §111 CI A mathe- 
matical exposition of the alternative methods is provided 
for in appendix El 

The Hilbert-space approach is most popular nowadays 
and is used in, e.g., in mode-coupling theory to describe 
the glass-transition [16]. The Hilbert-space referred to 
is not the space formed by state- vectors in quantum me- 
chanics. The vectors in the "Hilbert-space" are functions 
defined on the microscopic phase space. A "vector" , say 
A{T), is a quantity on the microscopic phase space in the 
sense a numerical value is assigned to any micro state 
r. The definition of a Hilbert-space requires an inner- 
product. Within the framework the inner- product of two 
function A{T) and B{r) is the expectation value of the 
product of A*B with respect to a (generalized) canonical 
ensemble. 

The Liouville operator acts on the vectors/functions 
in the Hilbert space. It is often referred to as a super- 
operator [lo| since, in the quantum-mechanical setting, 
it acts on observables and not on state- vectors. Once a 
Hilbert-space is formed the formalism proceeds in a sim- 
ilar way for the classical and quantum-mechanical case. 

A collections of macro states defines a linear subspace 
of the Hilbert-space. The base vectors are given by func- 
tions X^(r). Here X^(r) is a macroscopic state corre- 
sponding to a microscopic state T. A finite number of 
macroscopic states i = 1, . . . , n are considered. The "pro- 
jection" of a general vector now consists of an projection 
onto this subspace. 

Although the Hilbert-space approach is attractive from 
a formal point of view it has drawbacks. The most im- 
portant drawback is that one needs to define an inner- 
product on the space as a starting point. Irrespective of 
the definition of the inner-product chosen one will get a 
formally exact result. Clearly a formal result is not nec- 
essarily a practically useful result. One needs to address 



the matter of which choice gives practically useful results. 
The usual starting point is to use a canonical ensemble 
for this. The Hilbert-space formulation using the equilib- 
rium canonical ensemble results in the Mori-formalism. 

The conceptual difficulty with defining an inner- 
product on the Hilbert-space is partly circumvented by 
taking a different point of view. A more physical point 
of view is to interpret the projections as expectation val- 
ues. This is the point of view taken by Grabert [Tj. As is 
shown in the appendix [X] choosing equilibrium canonical 
expectation values results in the Mori-formalism. 

Clearly, when interested in non-equilibrium phenom- 
ena it does not make much sense to consider equilibrium 
expectation values. If one tries to do describe projec- 
tions as expectation values using generalized canonical 
ensembles, however, one runs into trouble. To form valid 
projection operators linearizations have to be performed 
that seem unsatisfactory from a physical point of view. 
This is pointed out in detail in the appendix El Proceed- 
ing non-the-less with the linearized canonical distribution 
a generalized Langevin type equation is obtained. 

This "Langevin" equation contains a fluctuating term. 
The step from a formally exact result, namely, to a prac- 
tically useful result is made by replacing the fluctuating 
(deterministic) term by a stochastic process. We will 
point out in the main text that the step of modeling 
this fluctuating term as a stochastic process is not al- 
lowed for the generalized canonical ensemble. The de- 
rived stochastic equation is valid only near equilibrium. 
A second critique is that in Grabert's approach the use 
of the generalized canonical ensemble is motivated from 
outside the theory. It comes from statistical mechanics 
reasoning using a Gibbs entropy. It is not clear, a priori, 
why this is the correct ensemble to use at a small scale 
where fluctuations are important. 

The use of the generalized canonical equation is in a 
sense illogical when one also can define a micro-canonical 
one. This choice was made in the historic derivation 
by Zwanzig of a generalized Fokker-Planck equationil^. 
By making this choice many things fall into place. The 
awkwardness in the derivation disappears. The projec- 
tions can be interpreted as conditional expectation values 
without applying linearizations. The expectation values 
are in accordance with the optimal predication frame- 
work [ll|, . The fluctuating term has the correct prop- 
erties that allow it to be approximated by a stochastic 
process. 

Below we will turn the story around. When perform- 
ing a projection there are important reasons to use the 
micro-canonical ensemble. The main reason is that it 
is the optimal choice for computing conditional expecta- 
tion values. To define a conditional expectation value one 
needs to have an invariant measure. Classical mechan- 
ics provides this measure, namely, the Liouville measure. 
Continuing from this stage one finds a fundamental defi- 
nition of the entropy. It turns out that exp[S'(X)] is the 
Liouville measure of phase space per unit volume coarse- 
grained space, i.e., a density of states. If one likes mathe- 
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matical terminology, it can be defined more rigorously as 
a Radon-Nikodym derivative towards the Lebesgue mea- 
sure in the coarse-grained space of the pushforward of 
the Liouville measure to that coarse-grained space. Be- 
cause of this definition entropy is not a scalar quan- 
tity (exp[S'(X)] is a density). Careful application of 
projection-operator formalism gives us equilibrium and 
non-equilibrium thermodynamics. 



III. THE NONLINEAR LANGEVIN EQUATION 

The derivation of the nonlinear Langevin equation us- 
ing projection operator formalism can be found in many 
standard texts and papers [10|, 1121 ■ Here we provide a 
straightforward derivation. It is inspired on a derivation 
given in p^ . 



A. The Liouville operator 

For any physical quantity A the time development is 
described by means of a Liouville operator C, formally, 



dt 



At = CAt. 



It has the formal solution, 



At = exp[£t] Aq. 



(3) 



(4) 



In classical mechanics the quantity At is fully specified 
by the microscopic state, Fj, of the system, so AtiVo) = 
A{Tt). ^ 

Also in quantum-mechanics observables are completely 
determined by the time evolution of the initial micro- 
scopic state of the system. In that case At would be an 
operator evolving according to the Heisenberg descrip- 
tion of the time-evolution. In both cases, one finds for a 
product of quantities, that 



{AB)t = At Bt -> 
exp[/:i] (Ao So) = (exp[/:t]Ao) (exp[£t]Bo) 
C{At Bt) = {CAt)Bt+At (CBt). 



(5) 



Here the last product (or Leibniz) rule is derived by time- 
derivative of the preceding identity. We will encounter 
the Leibniz or product rule several times. 

The property that makes it very useful is that. 



exp[£t] fiAo) = /(exp[/:i]Ao). 



(6) 



is valid if the product-rule is valid (for holomorphic func- 
tions /). This rule might seem evident, in classical me- 
chanics, if one considers trajectories through phase-space. 

One can, however, write many systems, e.g., partial 
differential equations (first order in time), in the formal 
way of eq. (jH) introducing a more general Liouville op- 
erator. In this case the product rule does not neces- 
sarily hold. Also, the Liouville operators produced by 



projection-operator formalism do not automatically obey 
the product rule. 

Often one is interested in an ensemble of microscopic 
systems or in, e.g., time averages of a quantity. For these 
cases it is convenient to introduce a dual object, /i, that 
weighs the microstates. The pairing of a quantity A and 
the dual fi will be denoted as {A, /i), which gives a (col- 
umn of) number(s). Let an operator, say C (but it can be 
any operator), work on A then the conjugated operator 
is defined as 



{CA,^,) = {A,c^^,), 



(7) 



As a consequence, by using a series expansion for the 
exponential. 



{e^p[Ct]A, fj.) = {A,e^p[CH]fi) 



(8) 



So, if {At,^) is interpreted as an expectation value with 
respect to an initial ensemble of microstates then 



(At,/i) = (A,/it) with ^t = exp[£^t]^. 



(9) 



Here fixed microstates are weighted by an evolving en- 
semble. This is similar to the Heisenberg and Schrodinger 
picture in quantum mechanics. We will mainly work in 
the "Heisenberg" picture evolving A and weighting with 
respect to the initial states. 

When operator, £ is a derivation (i.e. obeys the prod- 
uct rule), then combining eq. ^ and the product rule, 
eq. (O, gives 



{{CA) B, fi) = {C {A B),fi)- {A CB, m) 
= {AB,C^fi) - {ACB,n). 



(10) 



This will be used in the derivations of the decomposed 
dynamics below. 

We will focus on the classical description. In a clas- 
sical mechanics setting the microscopic evolution can al- 
ways be thought of as a trajectory through phase space, 
parametrized by Ft. Up to now expression eq. ^ was a 
formal solution of eq. ([3]). For points in phase-space the 
operator is well defined, since 

Ft = exp[/:i]Fo. (11) 

Quantities are functions from phase space, S, to R". For 
the quantity AiT) we have the formal relation 

v4t(Fo) = exp[/:f]v4(Fo) = A(Ft). (12) 

An ensemble of initial states can be characterized by a 
measure ^. The expectation value of a quantity with 
respect to this measure is 



{At,fi) = / At{TQ)dfi[TQ] 



(13) 



Since C does itself not depend on time, a shift of the time 
index with a value —t and using definition eq. gives 



AtiTo) dfiiVo] = J A{rt)d^[ro] - J A{ro)dfi[r, 



(14) 
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By comparing this expression with the definition, eq. ([8]), 
for a subset of phase space points B (taken from the a- 
algebra corresponding to the measure) one obtains 



(15) 



Using this equation, and the chain-rule of differentia- 
tion, one finds that 

(£At)(ro) = At(ro) = Tf— - — = To — , (16) 



dTt 



if A(r) is differentiable. Performing partial integration 
of {CA, fi), or using eq. pS]) one finds 



the initial condition is called ergodicity. Because of the 
existence of conserved quantities such as energy, trajec- 
tories in phase space always remains inside a subspace. 
Therefore ergodicity is usually interpreted as uniqueness 
of the solution of eq. ((20l) on the subspaces defined by 
conserved quantities. 

Some kind of ergodicity is generally believed to be im- 
portant for the fundamentals of statistical mechanics. 
There are, however, many difficulties with this. More- 
over, here, we are interested in dynamic behavior. There- 
fore very long time-intervals can not be considered. One 
of the main goals of the present paper is to proceed as 
far as possible without making ergodicity assumptions. 



(17) 



For classical mechanics, when the coordinates 
parametrizing phase space, F'^, are canonical, Liouville's 
theorem holds. Liouville's theorem states that micro- 
scopic phase space is incompressible. 



_d_ 



• F^ = 0. 



(18) 



Using this observation one can define the Liouville mea- 
sure for the parametrization with variables F by making 
a coordinate transformation of the canonical variables F'^ 
to F, 

dfiL [F] = dr, = det f — j dF, (19) 

here dT is used to denote the Lebesgue measure. This 
measure gives the usual volume of a (hyper)cube. For 
this Liouville measure we have that 



/:Vl = 0. 



(20) 



Measures that obey this property are called invariant. 
Therefore the Liouville measure /ii, is an invariant mea- 
sure. When monitoring the weight of a set Bt evolving 
in time according to £ then, using eq. (|15p . this weight 
is constant if the measure is invariant. 

When starting from a measure iiq and computing an 
expectation value by using both time averaging one finds 
an time-averaged measure 



1 1 
Hm - / ^itdt:^ lim - / exp[Ck] fio dt, 



If this average measure exists then for T ^ (X) 



(21) 



lim — 



- Mo 



-exp[/;tt]^„dt = ^lka^ T 



0, 
(22) 

so fi £ Null(£^). If eq. ((20)) has a unique solution (up 
to a multiplying constant), i.e., the null-space of £^ is 
one-dimensional, then this measure is necessary equal to 
p.. This uniqueness of the time-average, irrespective of 



B. Decomposition of tlie Dynamics 

Let the "real" dynamics of the system be generated 
by a Liouville operator C Let us now consider a quan- 
tity Af^^^ that follows this dynamics approximately. The 
difference between the "real" time evolution of an initial 
state given by exp[Ct] yl^"'^*, and A^"'^' is, 

exp[Ct] - = / (exp[/:t'] Af-*) dt' 

= j\MCt']{c + ^)Af^'i}dt'. (23) 

We use the superscript "fluct" to indicate fluctuating or 
rapid dynamics, to be distinguished from the slow dy- 
namics. 

Within projection operator formalism the Liouville op- 
erator is decomposed as 



C^VC+ QC. 



(24) 



In the derivations occurring in the body of this paper we 
use the definitions, 

Af"'^* = exp[Q/:t] Ao and At = exp[a] Aq. (25) 

Inserting this equation into eq. (|23p one obtains The de- 
composed equation using this quantity is 



AAt = / exp[Ct'] VCA^t-e dt' + AA 
Jo 



fluct 



exp[£t'] VCAo dt' 



(26) 



j exp[£i'] P£AAf_:!^* dt' + AA 

"^O 



fluct 

t 



where AAf'"'^* = ^fluct_^fluct (similarly A A* = Af-^o) 
The fluctuating dynamics is a solution of 



dt 



-A 



fluct 



QCA, 



fluct 



(27) 



with initial value Aq"'^' = Aq. An alternative definition, 
often found in literature, is to take an initial condition 
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^fluct = g^g. We will denote this definition using a 
tilde. The solution is then, 

if"=* = exp[Q/:t]QAo. (28) 

For this definition one finds the decomposition, 

At ^ exp[a]V Ao + 

I exp[/:s]P£if^f + (29) 

Within the projection operator formalism the opera- 
tors V and Q are projection operators. They have the 
properties, 

■P = ■P^ Q = Q2 and 7^ + 2= 1. (30) 

By convention V is supposed to filter-out the slow (or 
"relevant" dynamics) and Q the fast (fiuctuating or "ir- 
relevant") dynamics. 

As a consequence of the dynamics and the initial value, 
combined with the projection-property of the operator 
one finds 

V if"'* = 0. (31) 

For the fluctuating contributions as we have defined them 
the property is a bit weaker, namely, 

P^fiuct ^ ^^^^ ^Yia.i ■pAAf"'=' = 0. (32) 

The purpose of the projection operators is to filter out 
"irrelevant" contributions. In coarse-grained descriptions 
one wants to describe the problem in terms of coarse- 
grained variables that have the property X = VX. This 
means they are invariant under projection. 

In most derivations of projection-operator formalisms 
J^fiuct instead of Af"'^' is used. Note that the equa- 
tion for A ~ X as given by eq. is trivial. Since 
X^^^^ = g X = one finds that Xf"""^ = 0. Therefore 
the usual approach to obtain an equation for the change 
of X is to consider first A — X = C X . Next, to obtain 
a change of X one integrates X over a time- interval. An 
alternative approach is to consider eq. ((26|) with A — X. 
This will give exactly the same equation without the need 
for the time-integration. Also eq. (|26|) is a little bit more 
convenient as starting point for an approximation using 
stochastic processes. 

In the derivations presented in appendix [Al that com- 
pares we will consistently use Af^"^^ instead of A^"*^*. The 
main reason is that the Robertson/Grabert approach is 
difficult to express otherwise. In the body of the paper 
where we focus on the Zwanzig operator formalism using 
^fiuct i-yj.j^ Q^^- more convenient. 

C. Problems of the canonical-based formalisms 

To proceed further one needs to define a projection op- 
erator, V. In this paper we will argue that the projection 



operator as defined by Zwanzig is to be preferred. Other 
versions can be seen as near equilibrium, or thermody- 
namic limit, approximations. Derivations usually start 
from the (generalized) canonical ensemble. This starting 
point is understandable from the point of view of statis- 
tical mechanics. The canonical ensemble is much easier 
to handle. 

The derivations based on this approximation we have 
labeled Mori, Robertson and Grabert derivations. The 
derivations, are outlined in appendix [X] Before we start 
with discussing the Zwanzig formalism we want to dis- 
cuss the properties of these other flavors that make them 
unsuited as a starting point for a general framework of 
coarse-graining. The core problem with these derivations 
is that one micro state F is associated to a macro state, 
via a many-to-one transformation X{T). The canon- 
ical ensemble, however, associates one micro state F to 
more than one macro state X. The generalized canonical 
ensemble used to define a conditional measure, /x''°'(A')[S] 
associates a significant weight to not only the microstates 
with X(r) = X, but for any F with X(F) near to X. So 
a micro state contributes to more than one macro state 
X. 

One can try to define a projection as a conditional 
expectation value, P'~^X = {X, iJ"^^{X)) . By construction 
this operator has the property, 

rX^X. (33) 

However, for general nonlinear functions of g{X), 

{V g){X) ^ g{X). (34) 

The reason is the mentioned asymmetry in association 
between microstates X and macrostates F. 

For the canonical ensemble, applying the conditional 
expectation value several times gives p^pC _^ pC^ 
Therefore the canonical expectation value does not define 
a projection. This is a consequence of eq. p4|) . The pro- 
jection property is important because it results in eq. (|3ip 
and eq. ([32|) . If P*^ is used the "projection" of the fluctu- 
ating contribution is this non-zero. Its expectation value 
(according to the canonical ensemble) is non-zero. 

Within the Mori and the Robertson-Grabert formal- 
ism the "projection" property is restored by applying a 
linearization. Here, 

irg)iX) = gix^'^) + iX- x^'^) ■ (35) 

Because of property eq. p3p applying the projection 
many times gives the same result. Therefore the Mori 
and Robertson/Grabert projections are genuine projec- 
tion operators. The applied linearization makes that, if 
one views the projection as taking an expectation value, 
the used ensemble is not canonical anymore. The status 
is now that for the fluctuations eq. (pij) is obeyed. So, 
the expectation value of the fluctuations is zero. How- 
ever, one can doubt much is gained by the fact the ex- 
pectation value is zero with respect to a non-canonical 
ensemble. 
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Within the Mori approach the equiUbrium value a;®'^ 
is fixed. The Mori formalism is arrived at by means of 
a linearization. As already mentioned it can be restated 
in a Hilbert-space formalism fld\. Expectation values 
using the (generalized) canonical ensemble is than used 
to define an inner product. The inner product is used to 
define a projection. The macroscopic variables X^{T) are 
vectors in Hilbert-space that define a subspace where is 
projected on. This mathematical elegance is one of the 
reasons for the popularity of the Mori-formalism. 

In the Robertson and Grabert approaches the value 
of x^'^ is made to change in time. The Robertson and 
Grabert approaches can be seen as damage-control be- 
cause of the loss of the canonical ensemble. They reduce 
the error made by the linearization. This is at the ex- 
pense of complicating the framework by the need to intro- 
duce a time-dependent projection operator, see appendix 

m 

The treatments of Mori, Robertson and Grabert re- 
store the projection property. Because of this the frame- 
work of the projection operator method can be used. 
Therefore formally exact generalized Langevin equations 
can be derived. However, the non-equality eq. (I34p re- 
mains there. As we will argue later, eq. l(34)) . excludes 
the formalisms as a basis for further (stochastic) approx- 
imations. This fact seems to be missed in most of the 
literature. 

The main shortcoming of the Mori formalism, eq. (p4)) . 
can also be elevated in an alternative way (i.e., different 
from the Robertson and Grabert "improvements" ) . This 
is by extending the basis of the subspace. One example, 
of such an extension, is not only to use functions, X^{T), 
but also quadratic ones, X*(r) X^{T). Clearly in this way 
an equality is obtained, instead of the inequality eq. ([M)) . 
for not only linear functions but also quadratic ones. To 
suite a particular problem one can use any basis. This 
choice of basis makes the Mori formalism very versatile. 

Many times the different projection operator for- 
malisms are presented as a choice dictated by conve- 
nience. In analyzing different methods we came to the 
conclusion that these are alternative methods to battle 
the consequences of eq. ([Ml) (while keeping the projection 
property). The one thing in common is that the start- 
ing point is always the (generalized) canonical ensemble. 
All have also in common that the projection operator 
obtained can not be interpreted as an exact expectation 
value using this ensemble. 



IV. THE ZWANZIG FORMALISM 

The Zwanzig formalism ^Tsll , which was historically the 
first projection operator formalism introduced, uses the 
generalized microcanonical ensemble instead of the (gen- 
eralized) canonical one. One could also view this as just 
a choice one can make. From statistical mechanics one 
might have the idea that there is little difference because 
of the equivalence of ensembles. This is all true, but only 



in the thermodynamic limit and close to equilibrium. 

We will turn the story inside-out. Because of the cen- 
tral importance of 

{Vg){X)=g{X), (36) 

let us try to find a projection operator formalism that 
obeys this equality. This turns out to be the Zwanzig 
operator formalism and it turns out to use the (general- 
ized) microcanonical ensemble! 

The big picture is as follows. Starting from a theory 
of coarse-graining, i.e. projection from the micro state 
to a macro state one finds the microcanonical ensemble. 
This theory gives a generalized Langevin equation that 
has superior qualities compared to the Mori one (and its 
extensions) because of eq. Coarse-graining there- 

fore dictates the microcanonical ensemble and from this 
the microcanonical definition of entropy. Therefore, also 
statistical mechanics follows from this theory of coarse- 
graining and not the other way around. The transition 
from the microcanonical ensemble to the canonical fol- 
lows from large-deviation theory [3, [13, HH ■ 

To find a projection operator that obeys eq. (|36p we 
will take the point of view of optimal prediction theory 
(Til [T^ . Here we will start out with a general measure 
fi, i.e., not necessarily the Liouville measure. Using this 
measure we will try to find an optimal prediction. Say 
we want to find a prediction or projection of A{T) by 
means of a function /(A"(r)) that depends on T through 
X. The prediction is said to be optimal (with respect to 
/i) when 

{\A - /(X)p,/i) = minimal. (37) 

This optimum is given by f{X) = Kn{A\X), the con- 
ditional expectation value. It has the defining property 
that for any function g{X{r)), 

{g{X){A~E^{A\X)),^^)=0. (38) 

Using this property the cross term cancels in: 

{\A-fiX)\^^^) = {\A-E^{A\X)\^^i) 

+ 2Rc((E^(A|X) - f{X)r {A - E^(A|X), m) 
+ {\E,{A\X)-f{X)\^ti) 
= {\A - E^{A\X)\\^i) + {\E^{A\X) - /(A)|2, ^) (39) 

and that therefore f{X) = E,^{A\X) is indeed optimal. 
In a similar way one can proof it is unique (in a square 
integrable sense) . Clearly, from eq. (|37p follows that if A 
is a function of X, the optimum is just Ep(A|A') = A{X). 
Therefore for any E^{E^{A\X)\X) = Ef,{A\X). This is 
the defining property of a projection operator 

ir^A)iX)^E^{A\X) (40) 

where is a measure on the microscopic space. This is 
the Zwanzig projection operator. The Zwanzig projec- 
tion operator obeys eq. (j36p for general functions g{X), 
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whereas Mori and Robertson/Grabert projection opera- 
tors do not. The optimahty of the Zwanzig projection is 
a strong reason for preferring it. The value does depend, 
however, on which measure fj. is used. The conclusion 
here is for a specified measure, say fj,L, the Zwanzig pro- 
jection give the optimal projection, better than, e.g., the 
Mori or Robertson/Grabert projection. 

The definition of eq. ([55]) is implicit. It can be made 
more explicit by introducing an indicator function or a 
Dirac measure. For a subset B of the coarse-grained X 
space we define the indicator function and the Dirac mea- 
sure as follows 



1 ifX eB, 
ifX<iB. 



(41) 



Having a measure on the microscopic space one can define 
a pushforward measure on the coarse-grained space as 

X4fi)[B] = f4X-\B)] - / ldX{T)) d^,[T]. (42) 



Taking g{X) = lisiX) in cq. ^ one has, 

lBixi^))Ai^)d^l[^]^ 

lB{X{T))Ef,{A\X{T))d^4T] = 

E^iA\X)dX4p)[X]. (43) 



Replacing the indicator function by the Dirac measure 
one can rewrite this as 



E^iA\X)dX,{fi)[X]^ 6x(r)[B]A{r)dfi[r]. (44) 



If the relation X(r) and the measure fj, are smooth 
enough (more precisely X^{^) is absolutely continuous 
with respect to the Lebesgue measure on the X-space) 
we can introduce a, so called, Radon-Nikodym derivative 
that defines the entropy as 



exp[S^{X)] - 



dX,{ti)[X] 
dX 



(45) 



Here we use the convention dX = rf/^Lcbosguc [X] . Taking 
the derivative left and right-hand-side of eq. (1441) gives 



dX J dX 

E^{A\X) = exp[-S,,{X)] J 6[X(r) - X] A{r) dfi[r], 

(46) 

where, 



d[X - X] 



dx 



is the Dirac delta-function that should be interpreted as 
a distribution rather than a function on the X-space. 

The projection operator, eq. (|40)) . can be seen as av- 
eraging the function A{T) over the subspace defined by 
X{T) = X. In the derivation (by means of introducing 
a Radon-Nikodym derivative) we assumed that X{r) is 
a continuous function. In many cases macroscopic vari- 
ables are discrete, e.g., the number of particles present 
in a cell in space. We will ignore this case in this paper 
and assume that all quantities are continuous. 

The quantity exp[S'^(X)] is the microscopic phase 
space measure, ^, per unit Lebesgue measure of macro- 
scopic space. The convenience of introducing the 
Lebesgue measure is that it is translational invariant. Us- 
ing this pairing notation of quantities and their duals we 
can rewrite eq. (|40p as. 



EM\X)=eM-S,{X)]{A5[X~X{V)l^i) 

= (A//"''(^)), 

where, at least formally, 

d/.'-°i'2(x)[r] = eM-SM)] - ^(r)] rf^p], (49) 

defines a microcanonical ensemble corresponding to 
macro state X . 

Let us now develop a generalized Langevin equation 
for the Zwanzig projection operator. So we will develop 
the terms arising in eq. ((26l) for the case of the optimal- 
prediction/ Zwanzig projection. The first term 

exp[£t] V^, Ao = exp[£t] E^(A|Xo) = ¥.^{A\Xt), (50) 

can be interpreted as the conditional expectation of A 
with respect to the current macro state Xt- 

The integrand in the second term in the last equation 
of eq. becomes. 



(47) 



exp[£t'] V^C^A'^'l''^ = E^(/: AAf^J,'|XtO. (51) 

The combination is a bit problematic, because 

the generator of the dynamics for A^"'^' is QL not £ (see 
eq. (EZl)). 

Using eq. (HDl) with the delta-function definition, 
eq. (j47p . inserted gives for the enumerator, 

((£AAf-*)5[X-X(r)],M) = 

-(A4-,'/:5[x-x(r)],M) 

+ (AAf-,'5[x-x(r)],/:V). (52) 

Simplifying the first term on the right-hand-side gives, 

-(AAf-*/:5[x-x(r)],Ai) = 

A.(x„A4-'5[X-X(r)],;.). (53) 
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Putting also the denominator in eq. (140]) back into place 
gives, 

^ ■ (exp[5(X,0]IE,(lo AAf-'|X,0) 

+ {AAf^'t}S[X-X{T)],C^^l). (54) 

Here the last term is very inconvenient. It disappears 
if one uses an invariant measure, such that ~ 0. 
Note that it is not required that the measure is ergodic 
here. Since the Liouville measure is known, a priori, to 
be invariant it is a logical and convenient choice to take: 
fi = fi^. For this choice we will define 

EiA\X) EE E^, iA\X), S{X) EE S,, (X). (55) 

The relation (|54p can also be used to derive a, so-called, 
degeneracy relation. For AA^'^'^' = 1 one obtains 



d 

dX 



(exp[S'(X)]E(X|X)) = 0. (56) 



This can be read as a generalization of the Liouville the- 
orem to the coarse-grained case. 

By taking X""'^* ^ QX ^ X - E{X\X) and noting 
that E{E{A\X) B\X,) = E{A\X,) E{B\Xs) one finds that 



E{Xo AAf^'tf\Xt') = E(Xo""'=' AA^^J,*|XtO, (57) 

because E(AAfi'^,*|X) according to eq. §21 ■ Sum- 
marizing: 



E(£ AAf^?,'|XtO = exp[-5(XtO]x 
d 



dX, 



- ■ (eMS{Xv)] E(l«-* AA^'Mt')- (58) 



The full non-linear Langevin equation for the Zwanzig- 
projection is 



AAt 



EiA\Xt,)dt' + ^ exp[-5(XtO] ^ • (exp[5(XtO] E(i:«"^* AAf'^'i}\Xt>)) dt' + AAf^'. 



(59) 



Using A — X one obtains an equation that can be used as an starting point to obtaining approximate coarse-grained 
equations for X. 



AXt 



^ E{X\Xt>)dt' + ^ exp[~S{Xt')] ^ • (exp[^(XtO] Mf_,,(XtO) dt' + AXf"^', 



(60) 



where 



Xf"=* = exp[Q/:t]Xo 



Mr{X) = E(AX^"'^*X(^"'=*|X) 



(61) 



The importance of the Zwanzig projection formalism 
for coarse-graining is that the fluctuating dynamics is 
split of. The time auto-correlations can be obtained by 
purely considering the fluctuating dynamics. One can 
solve the fluctuating dynamics and use the result as input 
to model the dynamics of X. The art of coarse-graining 
is to make a good choice for X. The X should be, prefer- 
ably, chosen such that the fluctuating dynamics is fast. 
This means that Xf^'^^'s decorrelate quickly. When this 
is the case the fluctuations can be well approximated by 
means of a stochastic variable. 

An important property of the Zwanzig projection is 
the following. For quantities. A, that can be expressed 
as a function of X (so A{T) = A{X{r))) one has 

ViAB){X) = E{AB\X) = A{X)E{B\X). (62) 

This is even a bit stronger than property eq. (j36p . Now 
consider two quantities A{X) and B{X) that depend on 



the macro state X. For these we have 
VC{AB){X)^E{C{AB)\X) 

= E{{CA) B + A{CB)\X) 
= E{CA\X)B{X) + A{X)E{CB\X) 
VC{AB) = {VCA)B + A{VCB) 



(63) 



This result is obtained by combining the product rule 
for £, eq. (O and eq. ((62|) . This last fine tells is that 
the combined operator VC obeys the product rule when 
acting on functions of X. By subtracting the equality 
from eq © one also has 



QC{A B) = {QLA)B + A{QLB). 



(64) 



So, QL obeys the product rule and can therefore be inter 
preted as a derivation! This gives, according to eq. (O 
that for any (holomorphic) function A{X), 

A fiuct A I "V^fluct 



A (Xf "'=*). 



(65) 



Therefore, in the Zwanzig formalism, the fluctuating dy- 
namics can be imagined as a trajectory through macro- 
scopic phase space. This is certainly not the case 
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for any of the other projection operator formahsms 
(Mori, Robertson, Grabert). Because for these formal- 
ism eq. p6|) is not found to be vahd. This gives that in 
those cases, even when A{X), that Af"'^* ^ 



V. STOCHASTIC DIFFERENTIAL EQUATIONS 

The generahzed Langevin equation, Eq. (|60p . is a for- 
mal decomposition of the microscopic equations of mo- 
tion. It contains no new information. Full expressions of 
the fluctuating term X*^"'^' are very complicated. Its use 
lies in the fact that it can be used as a starting point for 
approximations. 

Suitable choices for the macroscopic variables X can 
be made. The usual approach is to choose the variables 
such the remainder characterized by X*^"'^' decorrelates 
quickly. So, on the time that decorrelates Xt has 

barely moved. Integrating Eq. (|60p for At (larger than 
the decorrelation time) and replacing functions depen- 
dencies of Xf by Xq gives 



AX ^E{X\Xo)At+ 
d 



exp[-5(Xo)] 



dX. 



(exp[5(Xo)] M^(Xo)) At 

+ AXf'"'\ (66) 



with 



At 



M(Xo)«— / E(AX(r'='Xj"'='|Xo)dt'. (67) 



At 



The modeling assumption is that (complete) decorrela- 
tion is very fast, i.e., the change of X is very small on 
a time scale r. One is interested in phenomena on time 
scales much larger than r. Under the same assumptions 
one finds that for At = 0{t), 

E(AX«""* ^^fluct|^^) ^ 

r-At 

fE(Xtf|"'='(Xf"'=' - Xt*l"'=')|Xo) + 

]£((^fluct _ ^fluct) Xt^"<=t|Xo)) dt' 
At 

fE(Xo«"'='(Xf_"t",' - Xo^""*)|Xo)-f 

^{{Xf^l} - Xo"""') Xo^"'=*|Xo)) dt' 

^{M^{Xo)+M{Xo))At = 2M{XQ)At. (68) 

By definition, M is symmetric and positive (semi) defi- 
nite. However, M not necessarily so. In the general case 
one can write 

M = M + A, (69) 
where A is anti-symmetric, i.e., A^ — —A. 



Arguments like the central limit theorem (more specif- 
ically Donker's theorem) can be used to argue that X^^'^^ 
can be well approximated using a Wiener process W, 



^^fluct _ ^2M(Xo) • AWt, 



(70) 



where M is a positive definite matrix and AWt — Wt — 
Wq. A Wiener process is a Gaussian stochastic process. 
Each increment over a time-step At has zero average and 
variance At, 

E{AW\Xo) = 0, and E{AW AW\Xo) = lAt. (71) 

Increments over non-overlapping time intervals are sta- 
tistically independent. The stochastic term on the rhs 
of Eq. (|7D|) should be read using the so-called Ito- 
interpretation (see, e.g., [l^, H^). This means that the 
expectation value of the increment is zero. This require- 
ment is consistent with the property of increments of fluc- 
tuating quantities as having a zero expectation value with 
respect to the initial state, eq. ((32l) . 

Note that in the expectation value in eq. ([71]) is differ- 
ent than the expectation values in eq. ([66]) . In eq. ([66]) 
the expectation value denotes an integration over all mi- 
crostates F that obey X{T) = Xq. In eq. (|7T|l it is 
the expectation over the measure corresponding to the 
Wiener-process. For this second expectation value, by 
means of Bayes theorem, E(AVl^|Xo) = implies that 
E(Xo|AVl^) = 0. This is automatically obeyed if we 
model Xt to be an adapted (or non- anticipating) process. 
This means that Xt is independent of future events. The 
assumption that Xt is non-anticipating is always made 
when using stochastic differential equations. 

Under the assumption of rapid decorrelating fluctua- 
tions, the generalized Langevin equation, eq. (|60p . can 
be simplified to a stochastic differential equation. First 
consider, eq. ([GO]) , for At ^ t, then use the model- 
ing assumption that Xt is slow and that the fluctuating 
term can be modeled as Gaussian noise. One obtains a 
stochastic difference equation, eq. ((66|) (valid only after 
integration over At ^ t), which can be well approxi- 
mated by the stochastic differential equation 

dXt = E{X\Xt) dt + expos'] 



d 
'dXt 



M expiS] 



dt + V2M • dWt 



dS d 
E{X\Xt)dt + M-—dt + — 

OA-t OAt 



dt 



(72) 



V2M -dWt. 



This stochastic differential equation has three main con- 
tributions an instantaneous part, a biased part and a 
fluctuating (random) part. The first term on the rhs 
gives the instantaneous change of Xt averaged over all 
possible microstates consistent with this state. The last 
term models the fluctuations with respect to this aver- 
age motion. On time scales larger than decorrelation 
time, r, this is effectively modeled by means of a white 
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noise, or Wiener, process. The biased part (at least the 
symmetric part) gives a drift toward macro states with 
higher entropy. This bias can be explained intuitively by 
the argument that these regions correspond to a larger 
micro-phase-space- volume. 

We will end this section with a philosophical note. 
The approximation eq. (j66p is a controlled approxima- 
tion of a formal result derived from reversible dynamics. 
It is therefore valid irrespective of the direction of time. 
When playing the movie of life backwards in time it re- 
mains (to good approximation) valid. However, in this 
case approximating AXf as a stochastic process gives 
incorrect results. Equation (j72p will predict that entropy 
increases (on average), but when playing the movie of 
life backwards entropy decreases. We speculate that the 
fact that eq. (|66p can only be approximated well with a 
simple coarse-grained equation, such as e.g. eq. (|72|) . for 
integration in the future direction of time is an important 
part of the explanation of the arrow of time. 

The behavior of humans (and animals) is based on pre- 
dictions (what-if strategies) using prior information. Our 
brain stores coarse-level scale information and uses this to 
plan future actions. Making predictions, based on coarse- 
level information, i.e. X, is only possible in the "for- 
ward direction" of time. Coarse-grained equations such 
as eq. (|72p can give accurate predictions in the direction 
in which entropy increases (on average). Therefore mak- 
ing predictions, based on coarse-level information, is only 
possible in the direction of increasing entropy. Therefore 
human behavior, with the usual asymmetric notion of 
past and future, is only possible in the direction of in- 
creasing entropy. The statement that entropy increases 



in the direction of time called future is a tautology. 



A. Change of variables 

As a preparation on the description of coarse-graining 
in the next section let us first consider the behavior of a 
change of variables for the derived stochastic differential 
equation. This will illustrate a non-trivial transformation 
rule for the entropy. Let us consider the injcctive (one- 
to-one) transformation Y{X). 

One way to obtain the governing equation for Y is 
the use of Ito-calculus. To most common way to write 
stochastic differential equations is the Ito-interpretation. 
Here integrands in a time integral are evaluated at the 
initial time of each time-increment. In this notation the 
Leibniz rule (chain rule) for differentiation is not valid. 
The mean reason is the asymmetry between the treat- 
ment of the initial point and final point in a time-step. 
A different notation is the Stratonovich interpretation. 
Here integrands are, in a finite difference approximation, 
evaluated more symmetrically at point Xq + in- 
stead of Xq. Heuristically, applying the chain-rule in the 
Stratonovich interpretation (indicated by an open dot) 
and applying a second order Taylor expansion one ob- 
tains the rules for Ito-calculus. Here quadratic terms of 
dW dW can be replaced by the expectation value / dt, as 
given by, eq. ([7T|) . These heuristic rules can, of course, be 
rigorously proved, [1^ . For transformation of stochastic 
differential equations the following rules hold, 



dV 



dV 
dX 



odX 



dY 
dX 



dX 



1 d^Y 
2dX^ 



dXdX 



dY 

dX 



dX 



d^Y 
dX^ 



Mdt 



-'^■nm^t^eM-sixf^-^ 

= E(r|r)dt + cxp[-5(r)]A ^/.>^^T 



exp[S'(X)] 



dt 



d'^Y 



{M"Y exp[5(y)] 



dt + V2My ■ dw. 



BY 

Mdt + — -V2M-dW (73) 
oX 



The form of the equation of Y should be the same as 
that of X as is the case in last equation of eq. ([73]) . To 
obtain this last form the following transformation rules 
need to be used 



E{Y\Y) 



dY 

dX 
dY 



EiX\X), 
dY 



M 

dx dx' 



(74) 



S{Y) = S{X) - In 



^^'^dx) 



The transformation of EfKjJ'r) and M follows immedi- 
ately from the fact that one can take functions of X 



(such as dY/dX) out of the expectation value, eq. (I62p . 
The transformation for the entropy might be more sur- 
prising. Inserting the transformations into the last line of 
eq. (j73p and doing the required calculus proves the equal- 
ities. Alternatively, it follows directly from the definition, 
eq. gl]). 



exp[S{Y)] = 



dY^{fiL)[Y] 
dY 



dX dX,iiiL)[X 
dY 



dX 
exp[SiX)] 

Idet 



dY I 
dX I 



(75) 



Here dX/dY is the Radon-Nikodym derivative of the 
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Lebesgue measure in the X space to the Lebesgue mea- 
sure in the Y space, giving rise to the Jacobian deter- 
minant. A somewhat more sloppy, but easier to handle, 
definition of the entropy is by means of the Dirac delta 
distribution, 



exp[5(r)] - / S[Y -Y{T)]d^lL[r] 



^ fs[Y- Y{X)] cMS{X)] dX = "-^^^^^ 
J \aet 



(76) 



In this way the Jacobian determinant arises by means 
transformation of the delta distribution. 

The final form in eq. (|73p is just the generic form 
eq. ([7^ (with X replaced by Y). The route via Ito- 
calculus is tedious. Performing this exercise results in 
the important observation that to keep the equation in 
the correct form the non-scalar transformation rule of the 
entropy has to be taken into account. Usually in thermo- 
dynamics one quickly goes to the thermodynamic limit 
(see ijVII Ap . and uses a scalar entropy. Introducing a 
scalar entropy at the stage where fluctuations are impor- 
tant one runs into trouble. If all terms in the equation 
would transform in a tensorial (or scalarian) way then 
the Ito-calculus would ruin the transformation. The fact 
that the strange transforming entropy S{X) appears in 
the equation saves the day. 



VI. SUCCESSIVE COARSE-GRAINING 

In this section we will discuss how to coarse-grain an 
already coarse-grained description further. Suppose the 
intermediate level is described by a state X, and the more 
coarse-grained level can be expressed as Y{X). Here the 
relation X ^ Y is many-to-one (surjective). We will 
denote the (Zwanzig) projection operators as 



VA = E{A\X) and V^A = E{A\Y). 



Since E{E{A\X)\Y) = E{A\Y) one finds relations like 
-pvp = -p-py = -py and QQv ^ QVQ ^ Q. The de- 
composition of the coarse-grained dynamics of Yt obeys 
a similar equation as that of X the difference is that here 
instead of V is used for the decomposition. In some 
of the steps it is more convenient to work with in- 
stead of the conditional expectations. So we will switch 
between the two representations. 

The instantaneous part of the evolution equation of Y 
can be obtained straight-forwardly, 



V^CY = E{CY\Y) = E{E{CY\X)\Y) 



( dY 

e{e{cx\x) ■ — 



Y 



eM-S{Y)] 



dY 

m\x) . - 



(78) 



X 5[Y - Y{X)] exp[S'(A:)] dX. 
To obtain the fluctuating contribution to the dynam- 
ics of Y we want to evaluate y^2''^"'=* = exp[Q^£t]yo- 



For rt""'^' = exp[Q/:i]yo we have Yl 



fluct 



This equality is a unique property of the Zwanzig formal- 
ism as given m eq. jMl). Note, however, that ^ 
Y{Xi"^^'^^). The reason is that Q^C does not act as a 
derivation (does not obey the chain rule) for functions 
of X. Therefore, to construct Y^' "'^ one should, in the 
general case, consider both F/' "'^ and ' "'^ . 

When identifying 



Q}>c = vQyc + QQyc^{v -vy)c + Qc, (79) 



(77) 



this equality can be used to adapt, eq. (|26p . with substi- 
tution C Q^C to obtain for a general quantity A, 



^^yfiuct ^ f gxp[Q!'/:t'] {V - Vy)CAo dt' + f exp[Qya'] {V - P^)£AAf^?,* dt' + AA. 
Jo Jo 



fiuct 

t 



(80) 



Similarly to the development of eq. ([58)1 one flnds 



{V - Vy)CAAf!^'t^ ^ exph5(A:)]— • (cxp[5(X)] E(A:fl""' AAf^^'^^X] 



exp[-5(y)] A . (exp[5(y)] E(i>f-' AAf-*|y)) . (81) 
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In terms of conditional expectation values this gives, 



,fluct 



E{A,\xy 



y,fluct\ 



^(ioii;^^"'') ) dt' 



^*(^exp[-5(Xf/^-*) 



exp[-^(i;y^«-')]^p^ . (exp[5(i;^«-')] AAf-'|i;^«-')) j di' + (82) 



Since Xf"'^* and its statistics are known (note that 
^fluct _ A(Xf"'^')) one can solve this equation, in 



principle. Note that because (in general), 



^ by 



This equation can be solved using a Laplace trans- 
form. After performing this transform we multiply 
^j,,fluct ^ ^fiuct obtain Mt and M^'"" = 



y^^y.fluct^^ one needs to simultaneously solve these equa- E(AX?'^«"'='Xo^'^"'*|X). this gives 



y,fiuct 



tions for = and Af^""^' = 



A. The linear case 

To get a feeling for the general equation we will look 
at the special case where 

Y ^ B ■ X and S{X) ^ -^X ■ A- X (83) 



[1 + M, • A] 



Mr = [1 



where A = {A — B ■ A^ ■ B) . Here the subscript s indi- 
cates the Laplace-transform variable. If one has a short 
correlation time r, then Mg — s~^M, where M is inde- 
pendent of s, for s ^ T~^. For a stochastic differential 
equation the limit r — > is the equality is found for all 
s. For this limit we thus have 



which gives using the entropy definition that, 
S{Y) = c-^Y-A^-Y, with A^ = {B-A-^-B)-\ (84) 

Here B and A are taken to be independent of X. Fur- 
thermore we will assume that Mt = E{AXj^'"=*-X^'"'*-\X) 
is also independent of X. The instantaneous part instan- 
taneous, ¥.{X\X), will be left out of consideration here 
and is taken to be zero. 

Note that our starting point is, explicitly, not a 
stochastic differential equation. Making that approxima- 
tion would mean that we lose the information on Xq"'^'. 
The derivation is much simpler if this information is still 
available. Clearly, in practice, we often start from the 
stochastic level. Some subtleties that arise in this case 
will be discussed in ijVIEl 

The final goal is to find an expression for 
jg(^y^y,Huctyy,tiuct|y^^_ When this expression is known 
the coarse-grained equation for Y can be written down. 
In this special case one has Y^'^^'^^ = B ■ Xj^'^^'^^. There- 
fore we only need to study Xf' "'^ and can obtain the 
information on x^'^^'^^ from this. Inserting the assump- 
tions into eq. ([5^ gives 



AX! 



y.fluct 



Mt-t' ■[A-B-Ay-B)-Xy 



y,fluct 



dt' 



AX 



fluct 



(85) 



[s + M ■ A]-^ ■ M. 



(87) 



If \k denote the eigenvalues of M-A this function of s will 
be singular for values s = —Xk- Performing the inverse 
Laplace transform one obtains contributions that decay 
as exp[— Afe t]. For times much larger than Ag"^ where Aq 
is the smallest non-zero eigenvalue these contributions 
have decayed. 

One can construct a projection matrix from the left 
and right null- vectors of M ■ A. Therefore the right null 
space is determined by the null space of A spanned by 
the "columns" of A^^ • B, and the left one by the "rows" 

of B ■ A~^ ■ M . This projection matrix is 



Q = {A^ B B A^ M ) 



with B A 



M 



A-^ B. (88) 



It has the property that Q M A = M A Q = 0. 
The matrix G is found from contraction of the vectors 
spanning the left and right null-spaces. 

Applying this projection to eq. one finds that 



(89) 



Applying P = 1 — Q to eq. ((87|) the contribution of s ^P- 
M ■ A ■ P that decay quicker than Aq ^ remain, but the 
zero eigenvalues are filtered out. This gives that for long 
times (when the other contributions have decayed) that 



Q M 



M . 



(90) 
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Using this expression we find that 

m" = lim E(Ar/'''""*yj''''""*|r) 

= B Q M {A^)-^ ■ ■ (A'^)"^ (91) 

Here we assumed that M and A are invertible. This 
is not the most general case. It is not even the typi- 
cal case, because typically there are conserved quantities 
present in the system. Usually the coarse-grained vari- 
ables are chosen such that conserved quantities can be 
expressed using these variables. These quantities give 
rise to null-vectors of M because conserved quantities 
do not fluctuate. Also A can have zero eigenvalues. For 
example, imagine the case where a Brownian particle is 
bound to the region near a plane by an entropic force, 
but is free to move in the direction parallel to the plane. 
For the linear case, considered here, these unconstrained 
direction are not coarse-grained (otherwise A^ given by 
eq. ([M)) would be ill-defined). For the singular matrix 
M the left null vectors are uninteresting. They will give 
contributions to Q that when used in the multiplication 
Q ■ M will give zero. For a singular matrix M ■ A we 
need to find vectors Va that are solution of the problem 
We first construct the null-vectors of A, 

Ua- A = Wa' B 

(92) 

Va ■ M — Ua- 

Here the rhs in the last equation, i.e. Ua, needs to be in 
the null-space of A, which gives rise to the first equation. 
If A is singular there are more a's then a"s. Furthermore 
Ua should be in the range of M for a solution to exist. 
Let 2;^ be right null vectors of M (i.e. M ■ = 0) then 
this means that Ua has to obey 

Ua-zp^ 0, V/3. (93) 

If M and A are invertible Wa can be chosen to be the 
base vectors of the coarse-grained space and the previous 
result is recovered. 

Clearly is determined up to a linear combination of 
left null-vectors of M . Therefore we will pose the extra 
condition 

va-zp^ 0, V/3. (94) 
to fix Va- Having solved this equation one can define 

Gaf) ^Va Ufi.Q^ UalG-^Tf'vfi. (95) 

The rank of Q is determined by the number of indepen- 
dent IDq's. 

Applying this projection to eq. ((86)) one finds that 

m" = B Q M B = B Ua{G-^T^up B (96) 

Note that if M is symmetric (i.e. M = M) then also 
~ y . 

M is symmetric. This can be seen from the fact that 
in this case Gap = Va ■ M ■ vp is symmetric. 



The general picture that arises is the following. For the 
long time behavior there are "constrained" and "uncon- 
strained" directions. For directions into which X^'-'^'"^* 
changes but not there is an restoring entropic 

driving force. For directions of X^'^^^'^* in the 
plane there is unconstrained motion. This motion is 
filtered out by Q and contributes to Af**. We believe 
that this picture remains valid for the general equation, 
eq. (El]). 



B. Connection to homogenization theory 

To illustrate the derived formulas we will give an out- 
line for the case of diffusion. In the current paper we will 
not go into detail of deriving continuum equations. This 
example is only to illustrate the power of the derived rela- 
tion. We will consider the case of a spatial concentration 
that deviates a little bit from the equilibrium concentra- 
tion. The main variables are 6c{r) = c{r) — Ceq{r). By 
considering a small deviation wc can remain in the (lin- 
ear) framework outlined above. The identification with 
the general theory is X — > 6c. The coordinates in space 
play the role of indexes, i.e., — > 6c{ri)- The contrac- 
tions indicated by the dots are replaced by integrals. The 
entropy is given by 

S„-/^*i^dV (97) 

For this expression we can identify A(r, r') = (r)5(r— 
r')- The matrix M can be identified as 

f-M-g = J V/(r) • Ceq(r)r>(r) • Vg(r) d^r. (98) 

Here D{r) is the (position dependent) diffusion matrix. 
This "matrix" is symmetric (so M — M), but also sin- 
gular. Constant functions, i.e. independent of r, span 
the (1-dimensional) null-space of M. 

As coarse-grained variables let us consider a finite num- 
ber of Fourier-modes Ck corresponding to small wave vec- 
tors k. 

5ck ~ j exp[iA; • r] Jc(r) d'^r, so i?j ^exp[ife-r]. (99) 

Therefore the null-vectors of A, which give the right null 
vectors of M ■ A are Ccq(r) exp[iA; • r]. The right null 
vectors can be found by solving, 

-V - (ceq(r)D(r) • Vw(fc,r)) = 

Coq(r) exp[ife • r] — a{k) Ccq{r) ^ exp[ifc' • r]. (100) 

k' 
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Here a{k) should be chosen such that the right-hand side 
is in the range of M, 



a{k) = '^"'^''^ — , with 

Coq.fc = J Coq(T") exp[jfc • r] (fr. 



(101) 



If we assume that Ccq(r) does vary rapidly on small 
length scales but is homogeneous on larger length scales 
then Ccq,fc = if the fe's are small enough. Solving the 
"cell problem" one finds v{k,r). The effective diffusion 
coefficient is then given by: 



^-ff " V{c, 



cq/ 



v{r,k)ccq{r) exp[—ik ■ r]d'^r. (102) 



For the one dimensional case one finds that 

{x)D{x)—v{k,x)^ 2, / Ccq{x') ex-p[ikx']dx' 

e:xjp[ikx']Ax' 



Xj<x 



-cq/ 



1 

ik 



[ccq) exp[ikx] 



(103) 



Applying this procedure a second time then gives 

1 v-^ r^j+i 
v{x,k) K - — {ccq) \ / [ccq{x')D{x')]^^ exp[ikx']dx' 

1 



^2 (Ccq) ( [Ccq D] ^ ) BXp [ikx] 



(104) 



The equilibrium density follows, e.g, from a rough back- 
ground potential, Ccq{x) cx exp[— [/(x)/A;T], the result 
predicts that diffusion is much hampered if potential en- 
ergy differences a few times kT are present. For the case 
of constant D the equation is a classical result, see e.g. 

The general result in more dimensions can also be 
found be means of homogenization techniques [H, [25j . 
The presented procedure is, however, more general. It 
gives a general recipe for (near-equilibrium) coarse grain- 
ing. It is valid when properties on the fine scales are very 
rough. It also gives the recipe of how to treat a coarse 
graining if there is no wide separation of scales, such that 
homogenization techniques are not valid. 

By means of eq. (|87l) it also indicates the range of va- 
lidity of the obtained result. There is a restriction on 
time-scales. Coarse-grained equations are only useful to 
study phenomena above a certain length-scale. The non- 
zero eigenvalues of M ■ A are decay rates that appear 
by means of poles in the inverse Laplace transform. If 
the degree of coarse-graining is very large the spectrum 
of eigenvalues is (almost) continuous. In this case one 
might find, e.g., power-law region. 



C. The instantaneous part 

The instantaneous (ensemble averaged) rate of change, 
i.e., E(A|A) inherits the phase space incompressible 
property as given by eq. (|56p . This gives, that under 
certain restrictions concerning topology and smoothness, 

E(A|A) = exph5(A)] A . exp[5(A)]) , (105) 



where Q, 



This is a consequence of Stokes the- 



orem. Upon coarse-graining this matrix transforms as 



f dY dY 

~^Ua dx 



Y 



(106) 



This can be checked by evaluating the coarse-graining 
from E{X\X) to E{Y\Y) and inserting eq. pUS]) . So if 
one can find at one level one can find an expression on 
any level (and upon a change of variables). 

At the microscopic level Hamilton dynamics holds. In 
the canonical form this can be written as 



^ ~ gp ' (-^micro H{T)) . 



(107) 



Here imicio is a constant anti-symmetric matrix. It can 
be written in a block-diagonal form , where the 2x2 
blocks have ±1 as off-diagonal elements. Since, at this 
level of description S{T) — 0, this is of the required form. 
Coarse-graining the microscopic form gives 



E ( -TTT ■ £micro ' (r) 



X 



(108) 



Note that, because total energy is a conserved c^uantity, 
one often chooses the coarse-grained variables such that 
the total energy can be expressed as function of A". In 
that case H{r) = H{X{T)) and the energy can be taken 
out of the expectation value, such that 



n = LH{X), where L = si — ■ Lmicro • 



X 



(109) 



For the quantity L one finds the degeneracy condition, 
similar to the one of E(A|A), eq. ([5S)) . 



eM-S{X)]^-{L^eMS{X)])^0. 
The essential step in the proof is that 
dX 



(110) 



d 
dX 



dT 



3X 

—S[x-x{T)WL[r] 



Fix Fix r) 

^•£.nic..o-^-^<5[A-A(r)]rfM^[r] 



dX 



d 



i„,i„o-^<5[A-A(r)]dMi[r]. (Ill) 
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The final steps consist of partial integration and using 
the fact that imicro is constant and anti-symmetric. Ap- 
plying the degeneracy condition eq. (|110p . one can write 



(112) 



Note that, when energy can be expressed in terms of X 
then H{X) can not fluctuate. This results into 



dH 
dX 



• M = 0, M 



dH 
dX 



0. 



(113) 



Conditions eq. (|110p and eq. (|113p are the degeneracy 
conditions of the GENERIC formalism. In the original 
formulation a simpler degeneracy condition was given for 
L was only valid in the thermodynamic limit, [1]. This 
was corrected in ^, eq. (6.163)]. We were not able to 
proof the GENERIC claim that the instantaneous part 
is necessarily of a Poisson brackets form (obeying the 
Jacobi identity). 

Let's repeat the exercise from the previous section, i.e. 
coarse-graining in the linear case, including a constant 
ft. Coarse-graining the instantaneous part is straightfor- 
ward. 



B n B. 



(114) 



For determining a term — J* fl ■ A ■ Xt' dt' needs to 
be added to eq. ([85]) . One consequence is that 



X. 



-fluct 







n-A-x, 







X, 



i^-fiuct 







(115) 



This has no influence on the derivation since 



i(XoAXf"'^*|Xo) = 0. The final equation we find is 



(116) 



In principle the procedure for finding is the same as 
the method presented in WI Al 

If we assume that the coarse-graining is such that 
the energy can be expressed as a function of the 
coarse-grained variables (e.g., because internal energy 
is a coarse-grained variable), this means that H{X) = 
H{Y{X)). In this case the instantaneous part can be 
written as, 



Y 



dH 
dY 



(117) 



Therefore the instantaneous contribution to eq. (I82p . be- 
comes 



E(yl|A:) -E(^|y) = 

f OA , dY ^(dA 
— El - 



ax 



ax 



\dx 



aY 
ax 



Y 



OH 



(118) 



Note that, for the case of constant L and B the term 
E(X|A') — E(X|y) equals zero. Therefore, for this shape 
of instantaneous part, the coarse-graining does not con- 
tribute to X^'^^'^*, so eq. is not influenced. This 
means that for a (non-constant) of the form, 



n{x) ^ LH{Y{x)) + n, 



(119) 



where L and $1 are constant, one finds eq. (|116p with ft 
replaced by f2. 



D. Onsager-Casimir symmetries 

Let's investigate the symmetric and antisymmetric 
parts of A somewhat further. Usually M is taken to 
be symmetric because one expects it to obey the On- 
sager relations .26] . Casimir showed that in special cases 
there can also be an anti-symmetric contribution A. We 
will investigate these claims in our framework, also out 
of equilibrium. 

The reasoning is as follows. Microscopic dynamics are 
reversible. This means that to any micro state F a time 
reversed state can be associated. Let's denote T as the 
time reversal operator, then for any t, 



exp[-£t] = T exp[£t] T, 



(120) 



(and from this = \ and TC + CT = 0). For a 
micro state F there is a one-to-one functional relation 
TF = T(F). Using the fact that the microscopic Liou- 
ville operator acts as a derivation one finds that for the 
phase space velocity 



TF 



-F- 



aT 



and since r2 = 1, 



ar ar 

df ' df 



= 1, so that det 



ar 
ar 



= ±1. 



(121) 



(122) 



Usually a time-reversal operation corresponds to Tr* = 
r* and a change of sign of the momenta, Tp, = —p^ . 

Now one assumes that the coarse-graining is performed 
such that upon time-reversal, also for the coarse-grained 
space, there is a functional relation TX ~ T^{X). Usu- 
ally this established by making the members of X only 
to depend on even or odd powers of the momenta, such 
that TX* = ±A'\ One consequence of this definition is 
that 



exp[5(r-(X))] = 



exp[5(X)] 



= cxp[S{X)], (123) 



because, for the same reasons as in the microscopic case, 
the determinant is ±1. For the expectation values one 
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finds that 

TTE(A|X) =E(A|T^(X)) 



= exp[-5(x)] J A(T(r)) 5[x{r) - X] dfiL[r] 

= E{TA\X). 

(124) 

Here we extensively used the fact that T does leave mea- 
sures (and therefore also the delta distribution) invari- 
ant. In terms of projection operators we thus have proved 
that, T and V do commute, 



TV-VT ^ 0. 



(125) 



Therefore one can pull T through projection operators 
such that, e.g.. 



rexp[Q£t] = exp[-Q/:i]r. 



(126) 



For the instantaneous part we find that upon time- 
reversal 

TE{X\X) = E{TCX\X) = -E{CTX\X) 



-E{CT-{X)\X) = -^-E{X\X) 



(127) 



Which also gives that 



n{T''{x)) = Tn = -^-n{x)-^. (i28) 



dX 



dX 



For the correlations of the fluctuating contributions 
one finds that 



Let's consider a constant matrix TVf, i.e., independent 
of X. If the diagonalization is performed such that -|-l's 
are collected on the upper diagonal and the — I's on the 
lower one, the M-matrix (in the basis provided by the 
eigenvectors) will have the form 



M = 



(132) 

The symmetric part of M is due to quantities that have 
the same parity upon time-reversal. The anti-symmetric 
part is due to the interaction of quantities with opposite 
parity. 

Since, by definition, ft is anti-symmetric one gets from 
eq. (|128p for a constant fl matrix 



n = 







12 



12\T 



-(0-) 











-(f2i2)^ 




(133) 



So, also with respect to the Onsager-Casimir symmetry 
A and Jl behave the same. Lastly, because the entropy 
is invariant under time-reversal, we find that for A 



A = 



All 
A: 



22 



(134) 



Note that the Onsager-Casimir symmetries are strictly 
valid only for constant matrices M and fl. If the matri- 
ces are X dependent then the derived relation, eq. (|13ip . 
relates the matrix at X with the matrix at T^{X). If this 
relation is simple, say entries of M {T^ {X)) are ± entries 
of M{X) one can derive generalized Onsager-Casimir 
relations. In .J^ these relations are called "dressed" 
Onsager-Casimir symmetries. 



'dX 



g^^tluct j.:r(J^ttuct)|^)^ (129) 



Here we used to following relations, 

^^fluct ^ -j-Q^ exp[Q£r]Xo 

= -Qjr exp[Q£T]T"(Xo) 
= -QCT^iX^r') 

= -r^(x«';'='). 

Using a similar approach as at eq. (|68p . i.e. assuming that 
during a typical decorrelation of the fluctuating contri- 
bution the change of X is small, one finds that 



dT"" - T dT 
M{T^X))^ — -M {X) 



dX 



dX 



(131) 



The matrix dT^/dX can always be diagonalized with 
±1 on the diagonals. These eigenvalues indicate the par- 
ities upon time reversal. Usually the natural choice of 
variables is such that the matrix has this diagonal form. 



E. Instantaneous, reversible, isentropic 

In our theory, at the level of the stochastic differential 
equation, we have three matrices Jl, A and M. The ma- 
trix O characterizes instantaneous response of the system 
(averaged over all microstates X{r) — X). Upon coarse- 
graining it transforms according to eq. (jl06p . 

The matrices A and M follow from the time- 
correlation of the fluctuations. They are non- 
instantaneous. For the special case where all terms are 
linear eq. pi6p gives the auto-correlation of fluctuations 
upon coarse-graining. The result is given in terms of 
a Laplace transform which can be used to obtain the 
coarse-grained value M. Upon coarse graining of A and 
M time-correlations enter. The use of a stochastic dif- 
ferential equation for the further coarse-grained equation 
is only a good approximation of the decorrelation time 
corresponding to these fluctuations is small enough. 

Sometimes the instantaneous term, corresponding to 
Jl, is called the reversible term. We think that the com- 
bined term CI + A deserves this name. Both terms are 
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anti-symmetric. From the Onsager-Casimir symmetries 
both terms can be identified as reversible. If we write 



-A')exp[S{X)]], (135) 



i:--- = exp[-5(X)]A.((n^ 



this reversible contribution obeys 

eM~S{X)]-^ ■ (ir--- exp[5(X)]) = 0, (136) 

as do the terms individually. This can be seen as a gen- 
eralization of the Liouville-theorem. It is also a gener- 
alization of the isentropic evolution of reversible motion. 
In fact, as we will discuss below, the isentropic condition 
follows from this when the thermodynamic limit is valid. 
If one splits O, as 



n^n + L-H{x), 



(137) 



■H 



then 

d 

dX 

+ eM~S{X)] A . ((fi^ + A^) exp[5(X)]) (138) 

For this form eq. p36p can be seen to hold by invoking 
eq. (jllOp and the anti-symmetry of L. 

The remaining term M gives rise to irreversible motion 
(and fluctuations). 



dX'" = exp[-S'(X)] 



_d_ 

dX 



•(Mexp[S'(X)]) dt+V2M-dW, 

(139) 

The matrix M is semi-definite symmetric. Terms relat- 
ing quantities of opposite parity are zero. In the ther- 
modynamic limit this term reduces to a dissipative term 
M ■ dS/dX that always has a positive entropy produc- 
tion. For smaller systems there can be negative entropy 
production (but not on average). 

The full stochastic differential equation then becomes 



dXt = dt + dX" 



(140) 



Note that there is a very curious situation now. In this 
equation ft + A, or at least ft + A appears as one term. 
Both f2 and A seem to have similar properties. If one 
however looks at expressions of a more coarse-grained 
situation, say Jl and A^ , the matrices ft and A enter in 
a different way. One would hope that the expression for 
ft^ + Ay would depend on the sum ft + A, but not on 
ft and j4, individually. We were not able to proof this. 

We thus have thus the curious result that a class of 
stochastic differential equation for Y are all consistent 
coarse-grainings the same stochastic differential equa- 
tion for X, but correspond to different microscopic dy- 
namics. This situation is concerned with the fact that 
in a stochastic differential equation X is undetermined. 
Therefore one is unable to uniquely determine the anti- 
symmetric part of M . This was one of the reasons that 



in ^VI Al we did not take a stochastic differential equation 
as starting point. In many situations one puts A — 
from the beginning. However, upon coarse-graining, a 
non-symmetric contribution can pop-up from the contri- 
butions of the instantaneous part (see eq. (|116[) ). The 
situation remains a bit unsatisfactory from a conceptual 
point of view. 



VII. NON-EQUILIBRIUM THERMODYNAMICS 

The goal of non-equilibrium thermodynamics is to sup- 
ply a description of the time-evolution of a system in 
terms of coarse-grained, meso- or macroscopic, variables. 
The generalized non-linear Langcvin equation, after ap- 
proximation for the fluctuating forces, supplies such a 
description. 

Therefore the derived equations provide a description 
that can be called non-equilibrium thermodynamics. The 
theory deserves this predicate because entropy appears in 
it, and plays a central role. The entropy that appears in 
the theory is a microcanonical entropy. This is the basic 
definition. It is defined as (the logarithm of) a density 
of states. Therefore, upon coordinate transformation, 
eq. ([TS]) . it does not transform as a scalar. One might 
object this is not the way entropy should behave. One 
might think that entropy should be a scalar. Besides the 
constructive derivation, we have shown, however, that 
this is exactly how entropy should behave to preserve 
the general form of the equations when fluctuations are 
present. 



A. The thermodynamic limit 

In the thermodynamic limit entropy is expected to be- 
have as a scalar. The thermodynamic limit behavior of 
entropy, starting from the microcanonical entropy def- 
inition^ is well understood using large-deviation theory 

We will here give an outline of a personal interpreta- 
tion of these results. The central quantity in statistical 
mechanics is the sum of states. The (generalized) sum 
of states is just the Laplace transform of the density of 

states, 

Z{X) = J exp[S{X)] exp[-A • X] dX 

= J J S[X - X{T)]cxp[-\- X]dX dfj,L[T] (141) 
= J exp[^X-X{T)]dfiL[r]. 

The (generalized) grand-potential is then defined as 

$(A) = -lnZ(A). (142) 
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Note that, by definition, 
a2$(A) 1 



dXdX 
XiT) 



Z{\) 



(X{T)~X) (X{T)-X) 



X\{X{T)-X] exp[-A-X(r)]d^i[r], 

(143) 



is positive (semi) -definite for real A. This means that 
$(A) is a concave function of A. In the formula we used 
the definition 



X = 



5£(A) 



(144) 



Large systems can be decomposed out of N more or less 
independent, equivalent, sub-systems. Let us consider 
an extensive quantity, X, that is the sum of values XgyCo 
attained in these sub-systems. 



x(r) = ^x.ub(r") 



(145) 



one has, 



Zn{X) - / cxp[-A • ^sub(r")] W dA*L[r"] 



(146) 



> N 



= (^sub(A))' 

If the N subsystems are equivalent one thus finds that 



$Ar(A) = iV$(A), 



(147) 



where we take <I>(A) to indicate the thermodynamic po- 
tential of a subsystem. Note that such a relation can not 
be written for the entropy. In the case of entropy one 
needs to consider the convolution, i.e., also count states 
where quantities are not evenly distributed over the sub- 
systems. Therefore the Laplace transform is a convenient 
tool here, it transforms convolutions into products. 

If one now performs the inverse Laplace transform one 
finds that 



exp[5Ar(X)] = 



{2TTiY 



0+- 



exp[7V(-$(A) 4- A • X)]d\ 



exp[iV(-$(A) + \- X)]{-N dci (27r 



52$(A) 



V dXdX 



(148) 



Here x = X/N and the corresponding A follows from 
eq. (|144p for X = x. This result is obtained by consider- 
ing the dominant contribution to the integral, i.e. where 
eq. (jl44p is valid. There a second order approximation of 
the integrand is used. Here we purposely left in the de- 
terminant term to illustrate that also here the non-scalar 



behavior of Sn{X) is apparent. For large N one finds 
that 



^-(^)=-<I>(A) + A.. + (.^^^^ 



N 



N 



(149) 



The limit, to a very large system composed of many sub- 
systems. 



s{x) ~ lim 



Sn{X) 



N^oo N 



(150) 



is often called "the thermodynamic entropy" . We do not 
want to restrict ourselves to the thermodynamic limit. 
Therefore this definition is too restrictive for our pur- 
poses. We will stick to calling S{X) the entropy. We 
will call s{x) the "thermodynamic- limit entropy". The 
thermodynamic-limit entropy straightforwardly follows 
from this. 

The thermodynamic-limit entropy has the famous 
properties such as concavity, extensivity etc.. It obeys 
the ordinary thermodynamic rules. This thermodynamic 
potential and s{x) are related by the Legendre transform 

. ^ ^ .XX C)$(A) ^ ds{x) , , 

s{x)^X-x^<^{X), x^^^.X^^^. (151) 



dx 



A useful relation, we will use later on, that can be derived 
from this is 



dx _ (dxy^ ^ _ / d^s \ ^ 



dx V dX 



dXdX \ dxdx J 



(152) 



Note that we expressed the entropy, S, as function 
of the extensive quantity X. If we want to express the 
entropy as function of x = X/N, according to the the 
transformation rule of the entropy, eq. (|75|1 . 



Sn{x) ^ SNiX) + d In N = N s{x) + 0{lii N). (153) 

(here d is the dimension of the coarse-grained space). 
So the real entropy, also as function of a density, is N 
times the thermodynamic limit density, also if one uses 
intensive variables. 

The story told seems to be quite generally valid. It is 
worthwhile to investigate where it breaks down. A cru- 
cial step takes place at the approximation of the inverse 
Laplace transform, eq. (|148p . using the second order ap- 
proximation of the term in the exponential. This is not 
allowed when ^ is non-analytic at A. 

When does this occur? It can occur when s{x) is non- 
concave, bimodal, for example. When starting with a 
non-concave thermodynamic-limit entropy s(x) and in- 
serting S{X) = Ns{X/N) into eq. ((TiT|) one finds that 
in the limit — > oo 



$(A) = inf{A • X - s{x)}. 



(154) 



The reason is that the largest term in the exponent dom- 
inantly contributes. This is called the Legendre-Fenchel 
transform. When s{x) is concave this transform gives 
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the same result as the Legendre transform. If s{x) is 
non-concave only those a;'s of the domain where s{x) 
superimposes with the convex-hull of s(x) play a role. 
Still, <i>(A) stays concave, since this is a general prop- 
erty independent of the thermodynamic limit. However, 
when computing $(A) using the infimum sometimes the 
X jumps form one region to the other upon a small change 
of A to obtain the smallest value. At these A's there is a 
non-analyticity in $(A). 

Now consider the case that the entropy of a finite sub- 
system, S'sub(A^), is non-concave. If S'sub(-'^) is well be- 
haved then Z{X) is analytic (and larger than 0) in A. If 
one now finds that subsystems are (more of less) inde- 
pendent, then for large N, —N\nZ{X) is also analytic. 
As a consequence, s{x), will be found to be concave. The 
general conclusion is that entropy s{x) is concave unless 
the system is non-extensive in the thermodynamic limit. 

Put it the other way around, the proof whether a sys- 
tem has an concave thermodynamic limit boils down to 
proving that the system is extensive. One can rigorously 
proof this if, e.g., potentials are sufficiently short ranged 
poi . \2l\ . Systems that can show non-concave behavior in 
the thermodynamic limit are typically systems with long- 
ranged interactions. The most notorious class is gravita- 
tional systems. Here one finds non-standard thermody- 
namics, such as negative heat-capacities i27j . 

For many short ranged systems one has proved that the 
the entropy is concave. However, at phase transitions 
they can behave non-extensively. In this case systems 
can be heterogeneous or have long-ranged correlations. 
These two conditions can be reconciled by noticing that 
for these states s{x) has afhne patches where 



det 



d^sjx) 
dxdx 



0. 



(155) 



In these cases s{x) is not a good approximations of 
S{X)/N . When s{x) is very flat, the perturbation due to 
finite system size, determine the behavior. It determines 
whether S{x) behaves bimodal, by means of a "convex 
intruder" , or concave. The non-extensive terms dominate 
the dynamics and the structure. For finite size systems 
in these situations there is difference between the micro- 
canonical ensemble and the canonical one [1^. In this 
case when has to take non-extensive contributions to the 
entropy into account. In modeling, this non-extensive be- 
havior is often accounted for by entropy (or free energy) 
contributions contributed to interfaces or spatial corre- 
lations. Depending on the magnitude of these terms, 
fluctuations can still be neglected (e.g., two-phase macro- 
scopic flow), or are dominant (critical phenomena). This 
is the realm where mesoscopic modeling is often applied. 
Because it often hard to model the structure in a macro- 
scopic fashion one remains on a level of description where 
the structure occurs. 



1. Interpretation of the entropy definition 

We think it might be helpful to comment on our en- 
tropy definition. Our definition is objective, but depends 
on the variables AT* used to the describe the system. It is 
the logarithm of the density of states (Liouville measure 
per Lebesgue measure X) corresponding to the state X. 
It counts all the microscopic states corresponding to a 
state X, so not only states that are sampled in a certain 
time or something like that. 

To illustrate this point let us look at a consequence of 
this definition. So let's assume that the coarse-grained 
state X contains information on number of particles, mo- 
mentum etc. In this case the number of particles in 
a certain volume Vi can be computed from X to be 
Mi{X) {i = 1---N). Now consider the macroscopic 
entropy definition. Let be the phase space coordi- 
nates of particle F^ (particle position and momentum). 
Assume that the macro state does not change under 
the interchange of particles, so X{. . . , F^, . . . , Fi,, ■ ■ ■) — 
X{. . . ,Fy, . . . ,F^, . . . ). When computing exp[S'(A)] one 
needs to integrate F^ over all the spatial and momentum 
domain. Using the symmetry of X this can be simplified. 



M 



exp[5(A)] = / fj dAiL(r^)<5[X - A(F)] 

M\5[X - X{T)] 



' Mi{x)\---MN{xy: 

(156) 



i.e., one restricts the integration by Mi particles per vol- 
ume {CMi denotes the cumulative sum up to i) and 
accounts for all possible permutations by means of the 
multinomial coefficient. Note that if the subsystems are 
sufficiently independent, such that the thermodynamic 
limit is valid for each cell individually, then 



^exp[5(A)] 



n 



1 



M,(A) 



cxp[5(A^)]. 



(157) 



The full state of the system X is characterized by the 
states of the subsystems A = (A^, . . . , A*, . . . ). In this 
case A' can be the extensive state of cell i, e.g., the 
total particle number, momentum and energy associated 
with cell i. Here we use "associated" because not all 
quantities are fully localized. For example, if one has 
pair-interactions, part of the energy is due to interaction 
of particles in neighboring cells. Now one could account 
for this interaction energy by associating half of it with 
each of the neighboring cells. 

This explains the 1/Mil factor in the usual entropy 
definition. The treatment as presented here resolves the 
Gibbs paradox. When considering more then one species 
of particles, one introduces a distinction between these 
variables by means of A. One can, for example, use 
the density of the species. This means one that A is 
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only invariant under exchange of particles of the same 
species. Therefore the combinatorial factor is different. 
For a discussion of the Gibbs paradox and a solution of 
it along similar lines see [2§| . 

It is probably also clarifying to command on the sta- 
tus of the Gibbs entropy within this framework. This en- 
tropy, under the name of relative entropy, arises naturally 
in the context of large-deviation theory. It is only valid in 
the thermodynamic limit. This point of view is easiest ex- 
plained in the discrete case. Let's consider a larger num- 
ber, N, of independent subsystems and discrete states. 
Now Pa counts the fraction of subsystems with state a. 
The total entropy of a state X = N Pa Xa (here 

Xa is the value X corresponding to a state a and not 
the state of a subsystem with index a) . By counting the 
number of subsystems in the same state one can express 
the entropy by considering all permutations of subsys- 
tems in different states, 

eMS{X)] - Y{ (Npa)l -NY^pM 

xexp[7V^p„^(X,)]. (158) 

a 

The Gibbs entropy is an approximation of the multino- 
mial factor. Consider the Gibbs entropy, 

= - ^ Pa ln(/9Q exp[-S'(Xa)]) 

a 

(in the discrete case exp[S'Q] corresponds to the Liouville 
measure corresponding to the discrete state a). Maxi- 
mizing Sg under the constrained that X = Nj^a PaXa 
gives the dominating term in the sum. 

The computation makes no sense of the ensemble is 
purely fictional. There have to be real possibilities to 
distribute the extensive quantity over N subsystems. A 
similar interpretation of the different entropy definitions 
can be found in [30| and [sij . 



B. Expectation values in the thermodynamic limit 

Let us consider the case where, 

^(r) = ^E^-b(r") 

" (160) 

«(r)-^E«™b(r"). 

a 

For large N we want to know the expectation value 
E(a|a;). To compute this value we can use the Laplace- 



transform trick: 

j exp[S'(a;)] E{a\x) exp[-7V \-x]dx 

= Z,„b(A)^ Zsub(A)-i J asub(r") cxphA-a:;sub(r")] a^lP"] 
= exp[-7V $(A)] (a,ub, Ai'(A)) (161) 

Here eq. (|48l) was used. The measure /Lt°(A) indicates the 
generalized canonical probability measure 

dp%X)[r^] = Z-^i(A) exphA.x,„b(r")]dML[r"]. (162) 

Inverting this relation for large N and analytic <I'(A), 
gives using the inverse Laplace-transform a concentration 
for A that is related to x by means of eq. ()151|) . For N — > 
oo the expectation value is therefore well approximated 
by the canonical expectation-value, 

E{a\x) = (a,„b, m'(A)) = (asub)A (163) 

The deviation is 0{N^^). Below, we will also need to 
consider the situation of expectation value of a product, 
E{ab\x), where 5 is of a similar form as a. In this case 
one finds that 

N-^YT. [ a-b(r")Vb(r") 

X exp[-A • {xsuhirn + xsub{r>'))] l[dpL[r''] 

7 

— Zsub(A)^ ^Af~"^(asub bsuh)x 

+ il-N-^){a,^^)x{bsnh)x) (164) 
As a result we find that 

E{ab\x) = (asub)A {bsuh)x + 0{N~^) ^ ^ 

(165) 

= E{a\x)E{b\x)+0{N-^). 

C. Successive Coarse-graining of extensive systems 

We will consider systems that are well inside the ther- 
modynamic limit. In general, assuming that the entropy 
(near the thermodynamic limit) obeys relation eq. (jl57[) 
is not a good idea. In many situations, coarse-grained 
variables are not neatly localized in non-overlapping cells. 
One might think of the situation where one represents a 
state using base functions, such as in the finite element 
method, or in a Fourier representation. What we will as- 
sume is that on some, relatively, fine scale a system can 
be written by a suitable choice of variables in the form 
of eq. (jTFf)) . 

The level we are considering is a coarse-graining of 
this (thermodynamic limit) finer level. At such a coarse- 
grained level, one can approximate the entropy well by 

S{x) = Ns{x), (166) 
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where N can be considered a large variable. Differently 
from WII Al where a homogeneous situation was consid- 
ered here elements can denote the state of well sepa- 
rated regions in space, or states corresponding to different 
wave- vectors etc. The reason this expression is valid, is 
because it can be derived from coarse-graining a a finer 
scale that obeys eq. (|157l) . On the finer level the equation 
is valid for the individual subsystems (that themselves 
can be divided in N independent subsystems). Below we 
will proof that once the entropy has the form eq. (|166p 
it will remain of this form upon coarse-graining. 

In the thermodynamic limit degeneracy conditions 
simplify, for example eq. (|136p . becomes 



exp[A^s(a;)] ■ ^exp[A^ 3(2;)] 



d d 
ox ox 







• 1^ = 0, 

Ox 
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because N is extremely large. Therefore the (generalized) 
Liouville theorem gives rise to the isentropic condition for 
reversible motion in the thermodynamic limit. 

When successively coarse-graining this state further we 
will consider states y — y{x). It is convenient to work 
with intensive variables, such as mass, momentum and 
energy densities. The reason is that when the variation 
is little, the coarse-grained values are close to the origi- 
nal ones. The entropy of this coarse-grained scale then 
follows as, 



exp[S'(2/)] = j 5[y- y{x)] exp[S'(x)] dx 



5[y — y{xy\ exp[Af ■^(a;)] dx 



[x — x) 
d^s{x) 



dy{x) 



exp 



Siy) 



dxdx 
N s{x) + 0{\-siN) 



dx 

[x — x){x — x) 



Ns{x) 



(168) 



dx 



s{y) = s{x). 



This proofs that the entropy remains of the shape 
eq. p66p . The integral is dominated by the maximum 
thermodynamic-limit entropy under the constraint that 
y{x) = y. Because s{x) is concave this maximum is 
reached at a unique value of x = x. The value can be 
found by determining the unique values of x and A", for 
which two equations hold simultaneously, namely 



9six) ^ ^y^y^ dy{x) 



and y{x) = y. 



(169) 



dx dx 

Using the Legendre transform for the entropy to the ther- 
modynamic potential, ^{x) — X ■ x — s(a;), the equations 
can also be restated as 



dyjx) 
dx 



and yiXy) 



(170) 



For this state x corresponding to the maximum (con- 
strained) entropy we thus have s{y) = s{x). The re- 
lation between coarse-grained thermodynamic potentials 



is a bit less straightforward. 



^y{Xy) = xy-y-siy) = $(A) + A^-h/ 



dy(x) 
dx 



(171) 



An more elegant relation, that will be used below, be- 
tween the coarse-grained thermodynamic potentials fol- 
lows from the definitions as, 



d'^^y 
dxydxv 



dy_ 

dx 



52$ 
dXdX 



dX 

dxy' 



(172) 



Using eq. (|152p this relation can also be used to relate the 
(inverse) second derivatives of the thermodynamic-limit 
entropy. The expression for the derivative of the ther- 
modynamic driving forces A is a bit more troublesome, 
namely. 



dX 
dXv 



1 - A, 



d'^y 



dxdx 

resulting, using eq. (|152p . in 



dh 



dXydXy dx \dxdx 
\dydyJ 



dXdX) 



d^y 



dy_ 

dx 



(173) 



dxdx 



dy_ 

dx 



(174) 



Both expressions, eq. (jl7ip and eq. (|174p are compli- 
cated by the occurrence of the second derivative of y{x). 
In practice most transformations, in the thermodynamic 
limit, are linear. The reason is that one looks at coarse- 
graining extensive variables (or densities). In this case 
only weighted averaging, which is a linear transforma- 
tion, as coarse-graining makes really sense. Therefore we 
will assume, in the following, that 



dy_ 

dx 



-B, 



(175) 



is a matrix independent of x. If one should an encounter 
a situation where this assumption is not valid the analysis 
below should be redone including the complication. This 
is straightforward, but a little bit more involved. 

In the thermodynamic limit expectation values are 
concentrated. Expectation values are dominated by 
the maximum thermodynamic-limit entropy, obeying the 
constraint y = y{x), so for functions A{x), 



E{A\y) ^E{A\x) = A{x). 



(176) 



For the special case of the instantaneous rate of change, 
one finds that 



E{y\y)^B-E{x\x). 



(177) 



If one wants to express the result using fl, eq. ()106p . one 

gets 



ny{y) = B ■ n{x) ■ b. 



(178) 
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If the energy is extensive (if interactions are short- 
ranged), one can use eq. (|165p . on the definition of O, 
eq. (fTMll . One finds that 



n{x) ^ L{x) ■ ¥.{H\x) + 0{N-^). 
Let us name this C'(A^~"'^)-terni: 
/ dx ^ dx 



nix) 



dV 



{H{T)-nH\x)) 



(179) 



(180) 



This term is zero if the total energy can be expressed in 
terms of the coarse-grained variables, i.e., if 77 = E(i/ |x). 
Using this decomposition we obtain, from eq. (|105p , that 



E{x\x) 



dx 



|a;)+exp[-5(a;)]^ • (vt^ exp[5(a;)] 
ax \ 

(181) 

Here we used property eq. (|110p . 

If a cell consists of N independent subsystems the en- 
ergy is extensive (if interactions are short ranged). The 
microscopic L matrix can be well approximated by means 
of a block-diagonal form. Since we are interested in scal- 
ing behavior we will assume that subsystems are fully 
statistically independent, 



r-(ri,...,r^) 



x{T) 



H{T) = HiT"') = N'^ ^ N H{T° 



(182) 



.(r^) 



i'micro(r) — L 

This gives that 

H{x) = N h{x) and L{x) ^ N^^ l{x) 



(r^) 



(183) 



Combining this with the earlier observations that 



^(a;) = 7Vs(x) and $7(2;) = 7V~itD(x), (184) 



we find that 



E(i|x)=Z-5+ii-|^-f 0(7V-i). (185) 
ox ox 



In this thermodynamic limit the conditions, eq. (|56|) 
and eq. (jllOp . reduce to 



E(x\x) • = and Z • = 0. (186) 
ox ox 



The instantaneous rate of change is isentropic in the ther- 
modynamic limit. Because of the anti-symmetry of u> this 
term also is isentropic, and the second equation implies 
the first one. 



We expect that, in eq. ([67|) . only local contributions 
correlate. In the thermodynamic limit we therefore ex- 
pect that M — rh/N is the appropriate scaling with N 
in the thermodynamic-limit regime. Using this relation 
we obtain that 



exp[-A^s(a;)]^ • (^exp[iV s(x)] M^) w m • (187) 



Where the equation is exacts in the thermodynamic limit. 
When wc insert this relation into cq. (|82p with A = x this 
gives 



^xf = / [l{x\; ) ■ - l{xl; ) ■ ^ dt + [u:ixl' ) • —j^, - ^{x\; ) ■ — ^ j dt 

Jo ^ ox^, ox", ' Jo ^ ox", ox", ' 

+ /*(m,_,(x-«"-) . - m,_,(x-«-) . dt' + Axr\ (188) 

Jo ^ oXf, oXf, ' 



Note that x can always be computed from x^ since x 
follows from yix). Therefore the equation is a closed 
equation. This equation tells us that x\'' "'^ will be 
driven toward x\' "'^ . The difference between the two 
is kept away from zero by Ax^"'^*. Since this fluctuating 
contribution approaches for — > oo it can be con- 
sidered a perturbation. We expect Axf"'^* — 0{N~^), 
which is dominant compared to other ^^(iV"^ In A^) con- 
tributions. Therefore Axj"'^' is the only perturbation 



that needs to be considered. We can make a first order 
Taylor expansion around the initial value Xq'^"'^' = xq. 
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I.e., xl 



y,fluct 



A6x, = ^^{li,o)-§^)-j\sx,-Sx,)dt' 

+ -^(w(xo) • ^) • / iSxf - 6xt') dt' 
dxo V oxo ' Jo 



Axf"'^*. (189) 



The value of Sxt follows from Sxt as, 

Sxt = ■ B ■ A" ■ B ■ 6xt, 
where the definitions, 



(190) 



Aixo) 



and A^(a;o) 



dxdx 



dydy 



B A^ B 



(191) 

are as in the linear case i)VI A[ except that here the ma- 
trices depend on the parameter xq. This parameter is 
a constant in the equation for Sx. Therefore the solu- 
tion is completely equivalent as for the linear case. We 
obtain for m('^(io) = N E{dxt6xQ\xQ), in the Laplace- 
transformed form, 



mf^ixo) 

d 



l+( (s itD + m,.)+ (Ao' 
dh 



dxo 

1 

■ rhsixo), 

(192) 



For times large enough, such that the short time-scales 
corresponding to the poles of eq. (|192p are decayed to 
zero, one thus finds the approximation, similar to eq. (|91|) 
(or better eq. ([M]) ) 



myiy)^iAy)-^iy)-G-\x)-iAy)-^iy) 



(193) 



These results are fully rigorous in the thermodynamic 
limit (except for the restriction that B is taken constant). 



VIII. CONCLUSIONS AND DISCUSSION 

We performed a careful derivation of the generalized 
Langevin equation. Performing this derivation we found 
that the Zwanzig formalism is superior compared to oth- 
ers (Mori, Robertson, Grabert). The reason is that the 
canonical ensemble is not well suited in the case where 
fluctuations are dominant. To be able to perform non- 
equilibrium predictions using the Robertson and Grabert 
formalism one needs to perform quite artificial adapta- 
tions of the Mori formalism. The Zwanzig flavor, projec- 
tion operator formalism, does not have these problems. 
The important underlying reason is that this projection 
is optimal in a sense of optimal prediction theory. 



The derivation, as a result, gives a microcanonical en- 
tropy definition. This definition is objective and depends 
on which macroscopic variables, X, are used. The en- 
tropy is the logarithm of the density of states (Liouville 
measure) per unit volume X (Lebesgue measure). To 
compute the entropy, one should take into account all 
microstates consistent with a macro state and not only 
the states actually sampled. Entropy arises in dynamic 
equations because it measures the amount of phase space 
available when a system changes its coarse-grained state 
X. If there is more phase-space available there is a bias 
to go to that state. This is the thermodynamic driv- 
ing force. The ergodic point of view that entropy has 
to do with phase space visited in a certain time is not 
supported by our analysis. 

To illustrate this statement let us consider the entropy 
of a high molecular weight, entangled, polymer melt. 
Upon deformation the polymer chains gets stretched (on 
average). Subsequently the polymer conformations will 
try to relax towards equilibrium. Initially this relaxation 
is quick but soon polymer molecules will start feeling 
their neighbors. Because the melt is entangled relax- 
ations slows down. According to the theory of Doi and 
Edwards [32| conformations will be confined to a tube- 
like region. The contour-length and the cross sectional 
area of the tube is independent of the deformation. A 
polymer can only relax further by escaping the tube (so- 
called reptation). So, there is a two step process of re- 
laxation, namely, a fast process of the chain inside the 
tube and a slow one of the tube itself, here is a big gap 
between the characteristic time scales. 

Here comes the point. Suppose after a step-strain and 
subsequent fast relaxation inside the tube one charac- 
terizes the state by the strain. One want to know the 
entropy as a function of the strain. One might think the 
entropy can be computed from the number chain con- 
formations sampled by a chain inside each tube. Since 
the contour length and radius of the tube is not change 
after deformation (and subsequent relaxation) one finds 
this phase space volume is independent of strain. The 
mistake is that, in fact, also the number of tube config- 
urations, consistent with the strain-deformation should 
be taken into account, although these conformation are 
almost static on the time-scale under consideration. All 
entropy comes from this contribution. 

Entropy is not a scalar quantity. So, upon a change 
of variables extra terms appear. In the thermody- 
namic limit these terms are negligible. It has, however, 
consequences for small systems. In this case the cur- 
rent entropy definition deviates from other ones such 
as the Gibbs entropy. Because of the rigorous connec- 
tion through Zwanzig projection operator formalism with 
microscopic dynamics the current entropy definition is 
proved to be the correct one to use. If one approxi- 
mates the governing equation with a stochastic differ- 
ential equation the non-scalar transformation rule is es- 
sential. Only when allowing the entropy to transform in 
this way the form of the equation does not change upon a 
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change of coordinates, as follows from Ito-calculus. The 
Langevin equation poses no restriction on the set of vari- 
ables one uses to describe a system. The choice should be 
motivated by the problem at hand. What determines a 
good choice is the decorrelation behavior of fluctuations 
of the macroscopic variables. If they decorrelate quickly 
the formal generalized Langevin equation can be approx- 
imated by a practically useful stochastic equations. Only 
the generalized Langevin equation can be rigorously ap- 
proximated by a stochastic differential equation. The 
reason is that the fluctuating contributions can be seen as 
a path through the X-space, such that Af'"^^ = 
for functions A{X). This equality does not hold for the 
other projection-operator flavors. 

We provided the equations of successive coarse- 
graining. A stochastic differential equation for the 
coarse-grained variable Y can be provided if one knows 
the statistics of its fluctuations ya.fluc*. Besides giv- 
ing the general equation governing these fluctuations, 
eq. ([5^ . we studied and solved it for the linear regime 
and the thermodynamic-limit case. The general picture 
that follows from this is the following. The only fluctu- 
ations in the flne-scale, X, that are important to deter- 
mine are fluctuations that do change Y. Fluctua- 
tions that do not influence Y will stay close to X. Here 
X is the maximum entropy, >S'[A'], that obeys the con- 
straint Y{X) = Y{X). These irrelevant fluctuations are 
flltered out by a matrix Q. We gave an exact recipe how 
to compute this matrix. 

The procedure also indicates for which time-scales the 
coarse-grained equation is expected to be valid. Note 
that when motion of X out of the Y{X) is only slowly 
relaxing these times can be very long. Introducing new 
variables that catch this slow mode are then beneflciary. 
This is the art of coarse-graining. 

In several treatments, e.g. [1, [s^, expressions for suc- 
cessive coarse-graining are presented for stochastic differ- 
ential equations. Typically, somewhere in these deriva- 
tion Xf'^^'^^ is replaced by The argument used for 
making this assumption is that y is a very slow variable. 
When obtaining in this way one should only consider 
correlations on time-scales where Y is indeed slow. As we 
showed in our derivation, however, to flnd the full 
one needs to wait until all irrelevant fluctuations have 
relaxed. Therefore, when there is no wide separation of 
time-scale, methods like those provided in (3l.[33j will not 
give accurate results. 

In our procedure we give an exact equation for ^y^^"'^* 
In this equation the thermodynamic driving force that 
relaxes Y is eliminated. Therefore long i — > oo can be 
evaluated. Our relations reduce in a certain limit to ho- 
mogenization theory, but they are more general. When 
transport-coefficients are derived in the thermodynamic 
limit they exhibit, usually, long time-tales. Our method 
provides a cutoff that depends on the choice of coarse- 
grained variables Y. We believe that our method pro- 
vides a cutoff-time for this tail. Below the cutoff the 
tail will be present and be part of the transport coeffi- 



cient. Above the cutoff time the physical phenomena that 
causes the tail is part of the coarse-grained equation. 

The motivation of this research came from the need 
for coarse-graining method in computational methods. 
When coarse-graining form a molecular to a mesoscopic 
level, usually, thermodynamic-limit assumptions are not 
valid. We therefore hope that the method outlined in 
this paper will help to bridge the scales in these kinds of 
methods. 

A second application we have in mind is coarse- 
graining as an alternative for discretization. The coarse- 
graining procedure gives a recipe to generate equations 
using a finite number of variables. This is typically what 
is done when performing a discretization. Because, how- 
ever, coarse-graining is physical we do expect it may give 
rise to more stable methods. The thermodynamics is 
obeyed, so reversible parts are isentropic and irreversible 
part entropy increasing (in the absence of fluctuations). 
An obvious framework to apply the method to is stabi- 
lization in the flnite element method 13411. 



APPENDIX A: MORI AND 
ROBERTSON/GRABERT PROJECTION 
OPERATORS 

The goal is to device projection operators that link 
the microscopic description of a system to a coarser de- 
scription. Instead of characterizing a system with a mi- 
croscopic state r one would like to use a macroscopic 
(or mesoscopic) state X. The space of macro states is 
assumed to be much lower dimensional. A macro state 
characterizes a subspace of the microscopic space namely 
sets of microstates where X{T) have the same value. 

To make a direct link with statistical mechanics, in the 
Mori and Robertson/Grabert formalism, one tries to ex- 
press projections to the macro state as an expectation 
value over an ensemble. To link microscopic and macro- 
scopic spaces a relevant (probability) measures, is 
introduced. 

The microstates are assigned a portion of the statistical 
weight of the micro state. Typically (but not necessarily) 
generalized canonical ensembles are used. In this case 

d,-'(A)[r]^ -P[-^]g^-^(^)] d,.[r]. (Ai) 

The convention we will use is the following. When using 
this measure in an integral the value in the round brack- 
ets is flxed while F is integrated over. So, when integrat- 
ing over the microscopic space, F the X{X) and Z{X) are 
flxed, but X{T) varies with F. Following the usual frame 
work for canonical ensembles one has Z{X) = Z{X{X)), 
where 

ZiX) = ycxp[-A-X(F)]d^L[F]. (A2) 
The functional relations of A on X are such that 

X = (A, /.-1(A)) = / X(F) V<=n^)[r], (A3) 



26 



which gives the usual 

d\nZ{\) ^ ^, , 91n5C(X) 
X{\) ^ ^ \ and \{X) = — ^ — (A4) 



9X 



where 



S^{X) = X{X)X + \iiZ{X). 



(A5) 



One might be tempted to interpret the expectation 
value of A(r) with respect to as a projection 

of A onto X, 



{V''A){X) ^ {A,f^^-^\X)) = J A{r)df^^-^\X)[T]. (A6) 
However, V'~^ is not a projection-operator, since 

{V'^V'^A){X) = ((7'^A)(X),Ai"i(^)> = 

{V''A){X(^})d^l'^^\X)[T] ^ (P^ (A7) 



in general! Therefore the canonical expectation value can 
not be interpreted as a projection (which, by definition, 
obeys V"^ =V). 

One can make either of two choices if one wants to 
proceed. The first is to keep V'^ and accept it is not a 
projection operator. Using the formalism and decompos- 
ing the equation according to eq. (|29)) gives a fluctuating 
term that does not obey eq. (|3T|) . Clearly, this is not done 
within the projection operator formalism. Eq. (j3ip is the 
main identity that is used to make subsequent approxi- 
mations. Therefore this choice is not to be preferred. A 
second choice is to change V'" a bit such that the op- 
erator becomes a projection-operator. In this case the 
projection can only be identified approximately as an ex- 
pectation value over the canonical ensemble. Therefore 
part of the link to equilibrium statistical mechanics is 
lost. The second choice is, in our opinion, a less than 
elegant fix. 

One fix for V'^ to is to linearize with respect 

to X . This is done both in the Mori and the Robertson 
fiavor of the projection operator formalism. In the Mori 
flavor one linearizes around the equilibrium state X — 



= {l~ 5\- 6X{T)) 



9d/i'-°\x^i)[r] 



(l-(5A-5X(r))d//°'(a;'=<i)[r] 
(l-(5A(r) •5X)d//'=i(a;'=<i)[r], 



(A8) 



where 5X{V) = X{T) - x^'i and 



5X = 5X 



( d'^lnZ 



5X 



fdx- 



,oq 



sx 



(A9) 



= -{sxsx,fi'''\x'''i))-^ - sx, 

such that 

d^'''^-^{X)[r] = d/i'^°'(a;°'i)[r]x 

(^1 + SX{T) ■ {SX 6X, • 5X^ . (AlO) 

Because of the fact that SX appears only linearly the 
Mori expectation value can be expressed as 



= A'^^- (A(5A,m'°'(2:"'1)) - SX 



A'' 



nA ■ SX, 



(All) 



with 



nA = {ASX,pl'''\x'"i)) ■ {SXSX,n'''\x'='i))-'^. (A12) 

Applying this operation multiple times gives the same 
result (since ilx = 1). Therefore this operation defines a 
projection 



{V'^A){X) = (^,/<'''^(X)) 



(A13) 



Due to the linearization around the equilibrium the result 
can be expected to be useful near equilibrium only. 

For the Mori projection operator one always as- 
sumes that the equilibrium distribution is invariant, i.e., 
/:/i'°Hxoq) = 0. This is obeyed if C (A(xcq) ■ X) ^ 0. 
Therefore the linear combination given by A(a;cq) ■ X 
should be a conserved quantity. When this requirement 
is obeyed one finds, using eq. (jAlOp . that 



= -(Af-;,/:M'^°'-^(Xo)) 
= (if-;A:o,/i'-'='(x'='i))- 

{SXSX,^i'''\x°'i))-^ - SX, 



(AM) 



where SX = Xq — x^'^. Letting, exp[£s] act on it, as in 
eq. gives SXs = exp[Cs]{Xo - x^^) ^Xs- So, 
we have that 

exp[Cs]V^CAf'^f = Xo,Ai'°'(a;'='i)) • \^{Xs). 

(A15) 

The driving force can be seen as a linearization of A'^'(Ar) 
of the canonical X{X) around x^"^ where it should be 
noted that the time correlation matrix times X'^'^ equals 
zero because Xq ■ X'^^ = 0. Alternatively one can intro- 
duce an entropy, 

S^{X) ^ X'^'i ■ SX + ^{SX SX, ^i"'\x'''i))'^ : {SX SX), 
dS^{X) 



A^(X) = 



dX 



(A16) 
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The final linear generalized Langevin equation, eq. ([29|l . 
that arises from the Mori-formalism is, 

At ^ + Q.A ■ SXt+ 
ft 

{Af:^'i'Xo, m"H^''')> • X^{Xt')dt' + if"^', (A17) 

The first two terms combined are the expectation value 
of A{r) w ith respect to fi"'^''^ (Xt ) . The last two term in 
eq. (jA17|) are clearly related by a fluctuation-dissipation 
relation. The memory-integral term appears as a ther- 
modynamic term. Here the driving force is a derivative of 
the entropy. The time-correlation appearing in the mem- 
ory integral correlates fluctuations of A with the time 
derivative of the macroscopic quantities X. 

For the case of At = Xt the equation becomes trivial. 
Since X^"'^* = Xq - PXq = Xq- Xq = 0, X^"""^ re mains 
zero according to eq. (^5)1 . Also, fix = 1, so eq. (IA17P 
reduces to Xt = Xt. The most commonly used equation 
for further approximations starts with At — CXt- In this 
case one finds 

Xt = ^-5Xt+ f {Xf^^^Xa,^i"'\x'''i))■X^{Xt')dt'+Xf^'\ 
Jo 

(A18) 

where n = flex- 

Further developments along these lines are due to 
Robertson [35] and Grabert [13 • They introduced a lin- 
earization around a state xt (that evolves with time), so 



= dn''''\xt)[T] + 6X 



ddfi'^'ixtm 



dxt 



(A19) 



where = indicates that the subsequent expression should 
be linearized with respect to SX, and 

{V''{t)A){X) = {A,f,^°\xt)) + 

{ASX,^i'''\xt)) ■ {SX6X,fi'''\xt))-'^ ■ SX, (A20) 

with SX = X — xt- Operators like £ and V{t) act on 
X, but not on xt- Here we used the assumed general- 
ized canonical shape of eq. (jAip . Since Xt is time 
dependent one finds a projection operator that is time- 
dependent. Since (for all flavors of) V, V X = X, and 
the projection with results in a linear expression in 
X. 

When constructing a fluctuating quantity the prop- 
erty one wants to satisfy is a generahzation of eq. (|3T|) . 
namely, 



fluct 



no)Art' 



0. 



(A21) 



The reason one wants to use 7^(0) is because, as we will 
see further on, this corresponds to the initial ensemble. 
The expectation values of fluctuations with respect to the 



initial ensembles are made to equal zero. The generaliza- 
tion of the fluctuating term, eq. (pS)) . used by Grabert 
[13] that obeys this property is 



= Q(t') |r„ exp[y £Q{t") dt"^ I Ao, (A22) 

so here V{t') Afi^^*" = for all t' . The exponent is reverse 
time-ordered. This means that, by definition. 



— r_exp / CQ{t")dt" 



The decomposition as given in eq. (j29p becomes 



CQ{t')r^e^Y> / CQ{t")dt" . (A23) 



At = exp[£t]-p(t)^o+ 

exp[/: t'] {C + ^) dt' -f Alj\ (A24) 

The differentiation to t' gives an extra term (proportional 
to Q{t')) besides the term Q{t')CApf. To further sim- 
plify the resulting expression we need to use properties of 
the projection operator as introduced by Grabert. This 
will be done below. 

For the time-dependent projection-operator we find 
that 



(A25) 



From this relation one can straightforwardly deduce that 
Q^{t)Q^{t') = Q^it), and taking the derivative with 
respect to i at i' = t gives Q^(t)Q"(i) = Q^(t). 

From the definition that Q^{t) is a projection oper- 
ator one can find that Q^{t) = V^{t)Q^{t)Q^{t) + 
Q^{t)Q^{t)V^{t). Combining these two facts we have 



QR(i) =pR(t)QR(t)gR(i). 



(A26) 



Using this relation to evaluate the derivative to t' in 
eq. (jA24[) results into 



£ + —) i«7 = {£- Q^{t')C + Q^\t')) if,-' 



= V{t'){C + Q^{t'))A^tr' 



(A27) 



Inserting this equality and the definition of into 
eq. (|A24p gives 

At^(A,//"'(^*)> + 

{C + Q^(t'))if,"f , M^H^t')) + 4T* (A28) 



Here the linearization of Xt should be taken around Xt 
and Xt' around xt' . The reason is that in eq. (jA24|) 
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ex-p[Ct']T'{t') occur together. The projection gives a hn- 
earization around Xt>, the operator exp[i2t'] transforms 
Xq into Xf. The action of Q^{t') can be deduced from 
eq. (|A19|1 as, 

Q^{t')A^^V'Ht')A = -(A,i,,-^^^^^)-iX~x,). 

(A29) 

Using this formula for performing the hnearization of 
eq. (|A28|) gives 

At = (A.fi'^'Hxt)) + f{A'^:'fXo,^i'-\xt.))-X{xt.)dt' 
Jo 

+ nA{xt)-5Xt 

I ^fiuct 
+ ^0,t ■ 

(A30) 

Here we still see the linearized character of the equation. 
The full term + if ■ d/dxf) can be interpreted as a 
total time derivative. 

Clearly one would like that xt closely follows Xt- The 
usual choice, made by Robertson and Grabert, is to as- 
sume that Xt equals the expectation value over an initial 
canonical distribution, 

xt^ {cxp[Ct]X,fj."'\xo)). (A31) 

With this choice {exp[£t]6Xt',n'''\xo)) = 0. Us- 
ing the same notation, also for quantities at = 
(exp[£i] A, fi'^°^{xo)), we find that 

at = {A,f/'\xt)) + 

[\cAf,^f,t^'^'^\xt>))ds. (A32) 
Jo 



Because all terms linear in SX cancel. An extra assump- 
tion used is to take Xq = xq then (^4q^, /L(™'(a;o)) = 
as a consequence of requirement eq. (|A2ip and 5Xq = 
inserted into eq. (jA20p '). If this assumption is not made 
a term {1^""=^ fi'°\xo)) 

For /i''''' equal to the generalized canonical ensemble 
one has 



such that 



at = {A,t/'\xt))+ /*(i«"f^o,M''°H^t')>-A(xtOdi', 
Jo 

(A34) 



when Xq — xq. The deviations from this average, given 
by SAt, are similar (also linear) to the Mori generalized 
linear Langevin equation, eq. (jA18[) . The difference is 
the linearization around a time-dependent state Xt- This 
gives a few extra terms. 
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